flippers.models#

Groups label generative models.

Classes

SnorkelModel(*args, **kwargs)

A label model implementation for weak supervision based on a generative approach.

Voter(polarities[, cardinality])

Basic model that bases its decisions on a sum of votes (optionally weighted) for each class.

class flippers.models.Voter(polarities, cardinality=0)

Bases: _Model

Basic model that bases its decisions on a sum of votes (optionally weighted) for each class.

Parameters:
  • polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.

  • cardinality (int) –

    Number of possible label values.

    If unspecified, it will be inferred from the maximum value in polarities.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> model = ModelClass(polarities, cardinality)
fit(L, class_balances=[])

Fit the Voter model. This computes the weights for each class.

Reweighing the votes help especially when specific classes have a high overlap in their weak labels.

The weights are computed so the weighted sum of votes over training matches the given class balance.

This guarantees mean(y_pred_proba_train) = class_balance.

For majority voting, do not use fit.

Parameters:
  • L (pd.DataFrame) – Weak label dataframe.

  • class_balances (ListLike) –

    Numpy array of shape cardinality giving a weight to each class.

    When unspecified, assumes all classes are equally likely.

Return type:

None

Example

>>> L = [[1, 0, 1, 2], [0, 1, 2, 1], [1, 2, 1, 0], [0, 1, 0, 2]]
>>> class_balances = [0.6, 0.4]
>>> base_model.fit(L, class_balances)
predict_proba(L)

Predict probabilites using weighted voting.

Parameters:

L (pd.DataFrame) – Weak label dataframe.

Return type:

Array of predicted probabilities of shape (len(L), cardinality)

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> proba = snorkel_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)
classmethod load(filepath)

Load a saved model from a file.

Parameters:

filepath (str) – Path to the file containing the saved model.

Return type:

The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")
predict(L, strategy='majority')

Predict labels for the given weak label matrix using the specified strategy.

Parameters:
  • L (ndarray | DataFrame) –

    Weak label dataframe.

    Shape: (n_samples, n_weak)

  • strategy (str) –

    Prediction strategy to use. Supported values: majority, probability.

    Controls how labels are predicted from the predicted probabilites.

    • majority: Predict the label with the highest number of votes.

    • probability: Predict label j with probability proba[i, j].

      This can be useful to enforce specific class_balances in the predictions.

    Default is “majority”.

    If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)
save(filepath)

Save the model to a file.

Parameters:

filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")
class flippers.models.SnorkelModel(*args, **kwargs)

Bases: Module, _Model

A label model implementation for weak supervision based on a generative approach.

This implementation is based on the Snorkel library’s label model.

Like its snorkel library counterpart assumes that the labeling functions are independent conditionally to Y, similar to a naive Bayes assumption.

However, good results can also be observed in practice for correlated LFs.

See the following link[] for more information on how to use this model and a comparison with the Snorkel library’s implementation.

Example

>>> from flippers.models import SnorkelModel
>>>
>>> # Create a SnorkelModel instance
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Train the model
>>> label_model.fit(L_train)
>>> # Generate labels
>>> y_pred_proba = label_model.predict_proba(L) # shape: (len(L), n_classes)
>>> y_pred = label_model.predict(L) # shape: (len(L),)

Initializes a SnorkelModel instance with the given configuration options.

Parameters:
  • polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.

  • cardinality (int, optional) –

    Number of possible label values.

    If unspecified, it will be inferred from the maximum value in polarities.

  • class_balances (ListLike, optional) –

    List specifying class balance prior for each possible class, size n_classes.

    Defaults to balanced classes prior.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> class_balances = [0.7, 0.3]
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Change device
>>> label_model.to("cuda")
property device
fit(L, learning_rate=0.001, num_epochs=50, prec_init=0.5, k=0, verbose=False, **_)

Train the Snorkel model on the given weak label matrix L.

Parameters:
  • L (MatrixLike) – Weak Label matrix of shape (num_samples, n_weak)

  • learning_rate (float, optional, default: 1e-3) – Learning rate for the optimizer.

  • num_epochs (int, optional, default: 50) – Number of epochs to train the model

  • prec_init (float, optional, default: 0.5) –

    Initial value for precision

    Can be of shape (n_weak) to set precision for each LF.

  • k (float, optional, default: 0) –

    Weight of class blance loss term.

    This term penalizes the model for predicting a class on the train set differently to its specified balance

  • verbose (bool, optional, default: False) – When True, displays training progress using tqdm.

Return type:

None

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> label_model.fit(
...     L,
...     learning_rate=1e-2,
...     num_epochs=10,
...     prec_init=0.7,
...     k=5e-3,
...     verbose=True
... )
predict_proba(L, ignore_abstains=False)

Predicts the probabilities of the classes by updating the prior using the learned parameter mu as posteriors.

Parameters:
  • L (MatrixLike) – Weak Label matrix

  • ignore_abstains (bool, optional) –

    Whether to ignore abstains in the prior update:

    $ When False (default), uses both votes and abstains. This is recommended and helps leverage information gained from knowing which labeling function abstained.

    $ When True, updates prior only using non abstained.

Returns:

An array of predicted probabilities of shape (num_samples, num_classes).

Return type:

numpy.ndarray

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> proba = label_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)
classmethod load(filepath)

Load a saved model from a file.

Parameters:

filepath (str) – Path to the file containing the saved model.

Return type:

The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")
predict(L, strategy='majority')

Predict labels for the given weak label matrix using the specified strategy.

Parameters:
  • L (ndarray | DataFrame) –

    Weak label dataframe.

    Shape: (n_samples, n_weak)

  • strategy (str) –

    Prediction strategy to use. Supported values: majority, probability.

    Controls how labels are predicted from the predicted probabilites.

    • majority: Predict the label with the highest number of votes.

    • probability: Predict label j with probability proba[i, j].

      This can be useful to enforce specific class_balances in the predictions.

    Default is “majority”.

    If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)
save(filepath)

Save the model to a file.

Parameters:

filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")