boolean_jaccard Source Code
Jaccard metric calculations for boolean vectors.
Jaccard similarities and their p-values.
The code here represents a python implementation of the Jaccard package hosted here by N. Chung.
- jaccard.jaccard.bootstrap(x: numpy.ndarray, y: numpy.ndarray, px: Optional[float] = None, py: Optional[float] = None, n: int = 1000, seed: int = 42) pandas.core.series.Series
Use the bootstrap test to return a p-value.
The p-value is defined as the fraction of values in the null statistic whose absolute value is greater than the absolute value of the observed statistic.
Note
Returning a series facilitates applications with pandas and groupby.
- Parameters
x (np.ndarray) – a boolean array.
y (np.ndarray) – a boolean array.
n (int) – the number of bootstrap repetitions to perform.
px (Optional[float]) – The probability of success in x. If None, then px = x.mean()
py (Optional[float]) – The probability of success in x. If None, then px = x.mean()
seed (int) – the random seed to use for resampling.
- Returns
The Jaccard similarity and p_val.
- Return type
Tuple[float, float]
- jaccard.jaccard.distance(x: numpy.ndarray, y: numpy.ndarray, px: Optional[float] = None, py: Optional[float] = None) float
Calculate Jaccard distance.
Classically defined as:
Jdist = 1 - Jsimiliarity
The vectors must be of the same length, and must be boolean.
Note
The centering method applied in Jaccard similarity is not applicable here, so it is not passed as an option.
- Parameters
x (np.ndarray) – A boolean array.
y (np.ndarray) – A boolean array.
px (Optional[float]) – The probability of success in x. If None, then px = x.mean()
py (Optional[float]) – The probability of success in x. If None, then px = x.mean()
- Returns
The Jaccard distance between the 2 vectors
- Return type
float
- jaccard.jaccard.similarity(x: numpy.ndarray, y: numpy.ndarray, center: bool = False, px: Optional[float] = None, py: Optional[float] = None) float
Calculate Jaccard similarity.
The vectors must be of the same length, and must be boolean.
- Parameters
x (np.ndarray) – A boolean array.
y (np.ndarray) – A boolean array.
center (bool, default=False # noqa: DAR103) – Whether to center the score.
px (Optional[float]) – The probability of success in x. If None, then px = x.mean()
py (Optional[float]) – The probability of success in x. If None, then px = x.mean()
- Returns
The Jaccard similarity between the 2 vectors
- Return type
float
- Raises
IndexError – If the vectors are not 1-d, or if the vectors are not the same length.
TypeError – If the vectors are not boolean.