flippers.summary#

flippers.summary(L, polarities, digits=3, normalize=False)#

Calculate summary statistics for the given weak label matrix and polarities.

Parameters:
  • L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).

  • polarities (ndarray | List) – 1D array or list of size n_weak containing the polarities of each weak label.

  • digits (int) – Number of digits to round the output statistics to. Default 3.

  • normalize (int) – When True, shows overlaps/matches/conflicts as a ratio of coverage.

Returns:

  • “polarity”: The polarity of each weak label.

  • ”coverage”: The average ratio of samples that are assigned each weak label.

  • ”confidence”: The average confidence level of the assigned weak labels.

  • ”overlaps”: The ratio of assigned labels that have overlapping labels.

  • ”matches”: The ratio of assigned labels that have other matching labels.

  • ”conflicts”: The ratio of assigned labels that have conflicting labels.

Return type:

DataFrame of shape (n_weak, n_summaries) containing the following columns

Example

>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]])
>>> polarities = [0, 1, 1]
>>> flippers.summary(L, polarities)
    polarity  coverage  confidence  overlaps  matches  conflicts
0         0       0.5         1.0      0.50     0.00        0.50
1         1       0.5         1.0      0.25     0.25        0.25
2         1       0.5         1.0      0.50     0.25        0.50
>>> flippers.summary(L, polarities, normalize=True)
    polarity  coverage  confidence  overlaps  matches  conflicts
0         0       0.5         1.0       1.0      0.0         1.0
1         1       0.5         1.0       0.5      0.5         0.5
2         1       0.5         1.0       1.0      0.5         1.0