flippers.models.SnorkelModel#

class flippers.models.SnorkelModel(*args, **kwargs)#

Bases: Module, _Model

A label model implementation for weak supervision based on a generative approach.

This implementation is based on the Snorkel library’s label model.

Like its snorkel library counterpart assumes that the labeling functions are independent conditionally to Y, similar to a naive Bayes assumption.

However, good results can also be observed in practice for correlated LFs.

See the following link[] for more information on how to use this model and a comparison with the Snorkel library’s implementation.

Example

>>> from flippers.models import SnorkelModel
>>>
>>> # Create a SnorkelModel instance
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Train the model
>>> label_model.fit(L_train)
>>> # Generate labels
>>> y_pred_proba = label_model.predict_proba(L) # shape: (len(L), n_classes)
>>> y_pred = label_model.predict(L) # shape: (len(L),)

Initializes a SnorkelModel instance with the given configuration options.

Parameters:

polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.
cardinality (int, optional) –
Number of possible label values.

If unspecified, it will be inferred from the maximum value in polarities.
class_balances (ListLike, optional) –
List specifying class balance prior for each possible class, size n_classes.

Defaults to balanced classes prior.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> class_balances = [0.7, 0.3]
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Change device
>>> label_model.to("cuda")

fit(L, learning_rate=0.001, num_epochs=50, prec_init=0.5, k=0, verbose=False, **_)#

Train the Snorkel model on the given weak label matrix L.

Parameters:

L (MatrixLike) – Weak Label matrix of shape (num_samples, n_weak)
learning_rate (float, optional, default: 1e-3) – Learning rate for the optimizer.
num_epochs (int, optional, default: 50) – Number of epochs to train the model
prec_init (float, optional, default: 0.5) –
Initial value for precision

Can be of shape (n_weak) to set precision for each LF.
k (float, optional, default: 0) –
Weight of class blance loss term.

This term penalizes the model for predicting a class on the train set differently to its specified balance
verbose (bool, optional, default: False) – When True, displays training progress using tqdm.

Return type:

None

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> label_model.fit(
...     L,
...     learning_rate=1e-2,
...     num_epochs=10,
...     prec_init=0.7,
...     k=5e-3,
...     verbose=True
... )

predict_proba(L, ignore_abstains=False)#

Predicts the probabilities of the classes by updating the prior using the learned parameter mu as posteriors.

Parameters:

L (MatrixLike) – Weak Label matrix
ignore_abstains (bool, optional) –
Whether to ignore abstains in the prior update:

$ When False (default), uses both votes and abstains. This is recommended and helps leverage information gained from knowing which labeling function abstained.

$ When True, updates prior only using non abstained.

Returns:

An array of predicted probabilities of shape (num_samples, num_classes).

Return type:

numpy.ndarray

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> proba = label_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)

classmethod load(filepath)#

Load a saved model from a file.

Parameters:: filepath (str) – Path to the file containing the saved model.
Return type:: The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")

predict(L, strategy='majority')#

Predict labels for the given weak label matrix using the specified strategy.

Parameters:

L (ndarray | DataFrame) –
Weak label dataframe.

Shape: (n_samples, n_weak)
strategy (str) –
Prediction strategy to use. Supported values: majority, probability.

Controls how labels are predicted from the predicted probabilites.
- majority: Predict the label with the highest number of votes.
- probability: Predict label j with probability proba[i, j].
  
  This can be useful to enforce specific class_balances in the predictions.
Default is “majority”.

If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)

save(filepath)#

Save the model to a file.

Parameters:: filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")