boolean_jaccard Source Code

Jaccard metric calculations for boolean vectors.

Jaccard similarities and their p-values.

The code here represents a python implementation of the Jaccard package hosted here by N. Chung.

jaccard.jaccard.bootstrap(x: numpy.ndarray, y: numpy.ndarray, px: Optional[float] = None, py: Optional[float] = None, n: int = 1000, seed: int = 42) pandas.core.series.Series

Use the bootstrap test to return a p-value.

The p-value is defined as the fraction of values in the null statistic whose absolute value is greater than the absolute value of the observed statistic.

Note

Returning a series facilitates applications with pandas and groupby.

Parameters
  • x (np.ndarray) – a boolean array.

  • y (np.ndarray) – a boolean array.

  • n (int) – the number of bootstrap repetitions to perform.

  • px (Optional[float]) – The probability of success in x. If None, then px = x.mean()

  • py (Optional[float]) – The probability of success in x. If None, then px = x.mean()

  • seed (int) – the random seed to use for resampling.

Returns

The Jaccard similarity and p_val.

Return type

Tuple[float, float]

jaccard.jaccard.distance(x: numpy.ndarray, y: numpy.ndarray, px: Optional[float] = None, py: Optional[float] = None) float

Calculate Jaccard distance.

Classically defined as:

Jdist = 1 - Jsimiliarity

The vectors must be of the same length, and must be boolean.

Note

The centering method applied in Jaccard similarity is not applicable here, so it is not passed as an option.

Parameters
  • x (np.ndarray) – A boolean array.

  • y (np.ndarray) – A boolean array.

  • px (Optional[float]) – The probability of success in x. If None, then px = x.mean()

  • py (Optional[float]) – The probability of success in x. If None, then px = x.mean()

Returns

The Jaccard distance between the 2 vectors

Return type

float

jaccard.jaccard.similarity(x: numpy.ndarray, y: numpy.ndarray, center: bool = False, px: Optional[float] = None, py: Optional[float] = None) float

Calculate Jaccard similarity.

The vectors must be of the same length, and must be boolean.

Parameters
  • x (np.ndarray) – A boolean array.

  • y (np.ndarray) – A boolean array.

  • center (bool, default=False # noqa: DAR103) – Whether to center the score.

  • px (Optional[float]) – The probability of success in x. If None, then px = x.mean()

  • py (Optional[float]) – The probability of success in x. If None, then px = x.mean()

Returns

The Jaccard similarity between the 2 vectors

Return type

float

Raises
  • IndexError – If the vectors are not 1-d, or if the vectors are not the same length.

  • TypeError – If the vectors are not boolean.