flippers.models#

Groups label generative models.

Classes

`SnorkelModel`(args, *kwargs)	A label model implementation for weak supervision based on a generative approach.
`Voter`(polarities[, cardinality])	Basic model that bases its decisions on a sum of votes (optionally weighted) for each class.

class flippers.models.Voter(polarities, cardinality=0)

Bases: _Model

Basic model that bases its decisions on a sum of votes (optionally weighted) for each class.

Parameters:

polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.
cardinality (int) –
Number of possible label values.

If unspecified, it will be inferred from the maximum value in polarities.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> model = ModelClass(polarities, cardinality)

fit(L, class_balances=[])

Fit the Voter model. This computes the weights for each class.

Reweighing the votes help especially when specific classes have a high overlap in their weak labels.

The weights are computed so the weighted sum of votes over training matches the given class balance.

This guarantees mean(y_pred_proba_train) = class_balance.

For majority voting, do not use fit.

Parameters:

L (pd.DataFrame) – Weak label dataframe.
class_balances (ListLike) –
Numpy array of shape cardinality giving a weight to each class.

When unspecified, assumes all classes are equally likely.

Return type:

None

Example

>>> L = [[1, 0, 1, 2], [0, 1, 2, 1], [1, 2, 1, 0], [0, 1, 0, 2]]
>>> class_balances = [0.6, 0.4]
>>> base_model.fit(L, class_balances)

predict_proba(L)

Predict probabilites using weighted voting.

Parameters:: L (pd.DataFrame) – Weak label dataframe.
Return type:: Array of predicted probabilities of shape (len(L), cardinality)

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> proba = snorkel_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)

classmethod load(filepath)

Load a saved model from a file.

Parameters:: filepath (str) – Path to the file containing the saved model.
Return type:: The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")

predict(L, strategy='majority')

Predict labels for the given weak label matrix using the specified strategy.

Parameters:

L (ndarray | DataFrame) –
Weak label dataframe.

Shape: (n_samples, n_weak)
strategy (str) –
Prediction strategy to use. Supported values: majority, probability.

Controls how labels are predicted from the predicted probabilites.
- majority: Predict the label with the highest number of votes.
- probability: Predict label j with probability proba[i, j].
  
  This can be useful to enforce specific class_balances in the predictions.
Default is “majority”.

If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)

save(filepath)

Save the model to a file.

Parameters:: filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")

class flippers.models.SnorkelModel(*args, **kwargs)

Bases: Module, _Model

A label model implementation for weak supervision based on a generative approach.

This implementation is based on the Snorkel library’s label model.

Like its snorkel library counterpart assumes that the labeling functions are independent conditionally to Y, similar to a naive Bayes assumption.

However, good results can also be observed in practice for correlated LFs.

See the following link[] for more information on how to use this model and a comparison with the Snorkel library’s implementation.

Example

>>> from flippers.models import SnorkelModel
>>>
>>> # Create a SnorkelModel instance
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Train the model
>>> label_model.fit(L_train)
>>> # Generate labels
>>> y_pred_proba = label_model.predict_proba(L) # shape: (len(L), n_classes)
>>> y_pred = label_model.predict(L) # shape: (len(L),)

Initializes a SnorkelModel instance with the given configuration options.

Parameters:

polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.
cardinality (int, optional) –
Number of possible label values.

If unspecified, it will be inferred from the maximum value in polarities.
class_balances (ListLike, optional) –
List specifying class balance prior for each possible class, size n_classes.

Defaults to balanced classes prior.

Example

>>> polarities = [1, 0, 1, 1]
>>> cardinality = 2
>>> class_balances = [0.7, 0.3]
>>> label_model = SnorkelModel(polarities, cardinality, class_balances)
>>> # Change device
>>> label_model.to("cuda")

property device

fit(L, learning_rate=0.001, num_epochs=50, prec_init=0.5, k=0, verbose=False, **_)

Train the Snorkel model on the given weak label matrix L.

Parameters:

L (MatrixLike) – Weak Label matrix of shape (num_samples, n_weak)
learning_rate (float, optional, default: 1e-3) – Learning rate for the optimizer.
num_epochs (int, optional, default: 50) – Number of epochs to train the model
prec_init (float, optional, default: 0.5) –
Initial value for precision

Can be of shape (n_weak) to set precision for each LF.
k (float, optional, default: 0) –
Weight of class blance loss term.

This term penalizes the model for predicting a class on the train set differently to its specified balance
verbose (bool, optional, default: False) – When True, displays training progress using tqdm.

Return type:

None

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> label_model.fit(
...     L,
...     learning_rate=1e-2,
...     num_epochs=10,
...     prec_init=0.7,
...     k=5e-3,
...     verbose=True
... )

predict_proba(L, ignore_abstains=False)

Predicts the probabilities of the classes by updating the prior using the learned parameter mu as posteriors.

Parameters:

L (MatrixLike) – Weak Label matrix
ignore_abstains (bool, optional) –
Whether to ignore abstains in the prior update:

$ When False (default), uses both votes and abstains. This is recommended and helps leverage information gained from knowing which labeling function abstained.

$ When True, updates prior only using non abstained.

Returns:

An array of predicted probabilities of shape (num_samples, num_classes).

Return type:

numpy.ndarray

Example

>>> L = [
...     [1, 0, 1, 1],
...     [0, 1, 0, 1],
...     [1, 0, 1, 0]
... ]
>>> proba = label_model.predict_proba(L)
>>> # proba.shape = (len(L), cardinality)

classmethod load(filepath)

Load a saved model from a file.

Parameters:: filepath (str) – Path to the file containing the saved model.
Return type:: The loaded model object.

Example

>>> model = ModelClass.load("label_model.pkl")

predict(L, strategy='majority')

Predict labels for the given weak label matrix using the specified strategy.

Parameters:

L (ndarray | DataFrame) –
Weak label dataframe.

Shape: (n_samples, n_weak)
strategy (str) –
Prediction strategy to use. Supported values: majority, probability.

Controls how labels are predicted from the predicted probabilites.
- majority: Predict the label with the highest number of votes.
- probability: Predict label j with probability proba[i, j].
  
  This can be useful to enforce specific class_balances in the predictions.
Default is “majority”.

If there are no votes for a sample, will predict -1.

Return type:

1-D array of predicted labels of size n_samples

Example

>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]]
>>> predictions = base_model.predict(L)
>>> # predictions.shape = (len(L),)

save(filepath)

Save the model to a file.

Parameters:: filepath (str) – Path to the file where the model will be saved.

Example

>>> model.save("label_model.pkl")