flippers#
Modules
|
Groups labeling function utilities. |
|
Groups label generative models. |
Functions
|
Calculate the average confidence level per weak labeler. |
|
Calculate the number of fraction of labeled samples labeled differently by other labeling functions for each labeling function. |
|
Calculate the average of samples labeled per weak labeler. |
Filter out unlabeled samples from the given label matrix. |
|
|
Check if any labels exist in the given label matrix L. |
|
Calculate the number of fraction of labeled samples labeled similarly by other labeling functions for each labeling function. |
|
Convert a pandas DataFrame of weak labels in multipolar representation to monopolar representation. |
|
Calculate the number of fraction of labeled samples labeled by other labeling functions for each labeling function. |
|
Calculate summary statistics for the given weak label matrix and polarities. |
Calculate the total proportion of labeled samples in the given label matrix. |
- flippers.confidence(L)
Calculate the average confidence level per weak labeler.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak)1.
- Return type:
Series of size n_weak with average confidence level per weak labeler.
Example
>>> L = pd.DataFrame([[0, .1, 0], [1, 0, .5], [0, 0, 0], [.7, .1, .2]]) >>> flippers.confidence(L) 0 0.85 1 0.10 2 0.35 dtype: float64
- flippers.conflicts(L, polarities)
Calculate the number of fraction of labeled samples labeled differently by other labeling functions for each labeling function.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
L – Weak label DataFrame of shape (n_samples, n_weak).
polarities (Union[list, np.ndarray]) – Array or list of size n_weak containing the polarity of each weak label.
- Returns:
Series of length n_weak indicating the fraction of annotated samples with conflicting annotations for each LF.
- Return type:
pd.Series
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.conflicts(L, polarities) 0 0.50 1 0.25 2 0.50 dtype: float64
- flippers.coverage(L)
Calculate the average of samples labeled per weak labeler.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
- Return type:
Series of size n_weak with average of samples labeled per weak labeler.
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.coverage(L) 0 0.5 1 0.5 2 0.5 dtype: float64
- flippers.filter_labeled(L)
Filter out unlabeled samples from the given label matrix.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
- Returns:
Returns a filtered label matrix of shape (n_labeled_samples, n_weak).
Sliced on the condition that the first one is labeled.
- Return type:
DataFrame
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.filter_labeled(L) 0 1 2 0 0 1 0 2 1 0 1
- flippers.is_labeled(L)
Check if any labels exist in the given label matrix L.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
- Return type:
Series of size n_samples indicating whether a sample is labeled or not.
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.is_labeled(L) 0 True 1 True 2 False 3 True dtype: bool
- flippers.matches(L, polarities)
Calculate the number of fraction of labeled samples labeled similarly by other labeling functions for each labeling function.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
polarities (Union[list, np.ndarray]) – Array or list of size n_weak containing the polarity of each weak label.
- Returns:
Series of length n_weak indicating the fraction of annotated samples with matching annotations for each LF.
- Return type:
pd.Series
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.matches(L, polarities) 0 0.00 1 0.25 2 0.25 dtype: float64
- flippers.multipolar_to_monopolar(L, polarities_mapping={})
Convert a pandas DataFrame of weak labels in multipolar representation to monopolar representation.
- Parameters:
L (pd.DataFrame) –
The input DataFrame of weak labels.
Each column represents a weak labeler and each row represents a data point.
polarities_mapping (Dict[str, List[int]]) –
A dictionary specifying the possible polarities for each weak labeler.
The keys are the column names of L and the values are lists of integers representing the possible polarities.
If not specified, the function will attempt to infer the polarities by examining the unique values in each column of L.
- Returns:
A tuple containing the following elements:
A pandas DataFrame of monopolar weak labels.
A 1D numpy array containing the polarities
A dictionary specifying the original polarities for each weak labeler.
The keys are the column names of the input and the values are lists of integers representing the possible polarities.
- Return type:
L_monopolar, polarities, polarities_mapping
Example
Its always better to use an hand written
polairites_mapping.polarities_mappinglists possible polarities each labeling function can have.>>> multipolar = pd.DataFrame([[-1, 1, 2], [0, -1, 0], [-1, -1, 2]]) >>> polarities_mapping = {'0': [0], '1': [1], '2': [0, 2]} >>> L, polarities, _ = flippers.multipolar_to_monopolar( multipolar, polarities_mapping ) >>> L 0 1 2__0 2__2 0 0 1 0 1 1 1 0 1 0 2 0 0 0 1
If you dont want to create the mapping, the function can infer one. This is potentially breaking if multipolar does not contain all possible outputs.
>>> L, polarities, polarities_mapping = flippers.multipolar_to_monopolar(multipolar) >>> L 0 1 2__0 2__2 0 0 1 0 1 1 1 0 1 0 2 0 0 0 1 >>> polarities # output L polarities array([0, 1, 0, 2], dtype=int64) >>> polarities_mapping # input multipolar polarities {'0': [0], '1': [1], '2': [0, 2]}
- flippers.overlaps(L)
Calculate the number of fraction of labeled samples labeled by other labeling functions for each labeling function.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
- Returns:
Series of length n_weak indicating the fraction of annotated samples with other annotations for each LF.
- Return type:
pd.Series
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.overlap(L) 0 0.50 1 0.25 2 0.50 dtype: float64
- flippers.summary(L, polarities, digits=3, normalize=False)
Calculate summary statistics for the given weak label matrix and polarities.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
polarities (ndarray | List) – 1D array or list of size n_weak containing the polarities of each weak label.
digits (int) – Number of digits to round the output statistics to. Default 3.
normalize (int) – When True, shows overlaps/matches/conflicts as a ratio of coverage.
- Returns:
“polarity”: The polarity of each weak label.
”coverage”: The average ratio of samples that are assigned each weak label.
”confidence”: The average confidence level of the assigned weak labels.
”overlaps”: The ratio of assigned labels that have overlapping labels.
”matches”: The ratio of assigned labels that have other matching labels.
”conflicts”: The ratio of assigned labels that have conflicting labels.
- Return type:
DataFrame of shape (n_weak, n_summaries) containing the following columns
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> polarities = [0, 1, 1] >>> flippers.summary(L, polarities) polarity coverage confidence overlaps matches conflicts 0 0 0.5 1.0 0.50 0.00 0.50 1 1 0.5 1.0 0.25 0.25 0.25 2 1 0.5 1.0 0.50 0.25 0.50 >>> flippers.summary(L, polarities, normalize=True) polarity coverage confidence overlaps matches conflicts 0 0 0.5 1.0 1.0 0.0 1.0 1 1 0.5 1.0 0.5 0.5 0.5 2 1 0.5 1.0 1.0 0.5 1.0
- flippers.total_coverage(L)
Calculate the total proportion of labeled samples in the given label matrix.
- Parameters:
L (pd.DataFrame) – Weak label DataFrame of shape (n_samples, n_weak).
- Returns:
Total coverage, ranging from 0 to 1, indicating the proportion of labeled samples in the label matrix.
- Return type:
float
Example
>>> L = pd.DataFrame([[0, 1, 0], [1, 0, 1], [0, 0, 0], [1, 1, 1]]) >>> flippers.total_coverage(L) 0.75