flippers.models.SnorkelModel#

class flippers.models.SnorkelModel(*args, **kwargs)#

Bases: Module, _Model

A label model implementation for weak supervision based on a generative approach.

This implementation is based on the Snorkel library’s label model.

Like its snorkel library counterpart assumes that the labeling functions are independent conditionally to Y, similar to a naive Bayes assumption.

However, good results can also be observed in practice for correlated LFs.

See the following link[] for more information on how to use this model and a comparison with the Snorkel library’s implementation.

Example

>>> from flippers.models import SnorkelModel
>>>
>>> # Create a SnorkelModel instance
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Train the model
>>> label_model.fit(L_train)
>>> # Generate labels
>>> y_pred_proba = label_model.predict_proba(L) # shape: (len(L), n_classes)
>>> y_pred = label_model.predict(L) # shape: (len(L),)

Initializes a SnorkelModel instance with the given configuration options.

Parameters:
  • polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.

  • cardinality (int, optional) –

    Number of possible label values.

    If unspecified, it will be inferred from the maximum value in polarities.

  • class_balances (ListLike, optional) –

    List specifying class balance prior for each possible class, size n_classes.

    Defaults to balanced classes prior.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> class_balances = [0.7, 0.3]
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Change device
>>> label_model.to("cuda")
fit(L, learning_rate=0.001, num_epochs=50, prec_init=0.5, k=0, verbose=False, **_)#

Train the Snorkel model on the given weak label matrix L.

Parameters:
  • L (MatrixLike) – Weak Label matrix of shape (num_samples, n_weak)

  • learning_rate (float, optional, default: 1e-3) – Learning rate for the optimizer.

  • num_epochs (int, optional, default: 50) – Number of epochs to train the model

  • prec_init (float, optional, default: 0.5) –

    Initial value for precision

    Can be of shape (n_weak) to set precision for each LF.

  • k (float, optional, default: 0) –

    Weight of class blance loss term.

    This term penalizes the model for predicting a class on the train set differently to its specified balance

  • verbose (bool, optional, default: False) – When True, displays training progress using tqdm.

Return type:

None

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> label_model.fit(
...     L,
...     learning_rate=1e-2,
...     num_epochs=10,
...     prec_init=0.7,
...     k=5e-3,
...     verbose=True
... )
predict_proba(L, ignore_abstains=False)#

Predicts the probabilities of the classes by updating the prior using the learned parameter mu as posteriors.

Parameters:
  • L (MatrixLike) – Weak Label matrix

  • ignore_abstains (bool, optional) –

    Whether to ignore abstains in the prior update:

    $ When False (default), uses both votes and abstains. This is recommended and helps leverage information gained from knowing which labeling function abstained.

    $ When True, updates prior only using non abstained.

Returns:

An array of predicted probabilities of shape (num_samples, num_classes).

Return type:

numpy.ndarray

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> proba = label_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)
classmethod load(filepath)#

Load a saved model from a file.

Parameters:

filepath (str) – Path to the file containing the saved model.

Return type:

The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")
predict(L, strategy='majority')#

Predict labels for the given weak label matrix using the specified strategy.

Parameters:
  • L (ndarray | DataFrame) –

    Weak label dataframe.

    Shape: (n_samples, n_weak)

  • strategy (str) –

    Prediction strategy to use. Supported values: majority, probability.

    Controls how labels are predicted from the predicted probabilites.

    • majority: Predict the label with the highest number of votes.

    • probability: Predict label j with probability proba[i, j].

      This can be useful to enforce specific class_balances in the predictions.

    Default is “majority”.

    If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)
save(filepath)#

Save the model to a file.

Parameters:

filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")