flippers.models#
Groups label generative models.
Classes
|
A label model implementation for weak supervision based on a generative approach. |
|
Basic model that bases its decisions on a sum of votes (optionally weighted) for each class. |
- class flippers.models.Voter(polarities, cardinality=0)
Bases:
_ModelBasic model that bases its decisions on a sum of votes (optionally weighted) for each class.
- Parameters:
polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.
cardinality (int) –
Number of possible label values.
If unspecified, it will be inferred from the maximum value in polarities.
Example
>>> polarities = [1, 0, 1, 1] >>> cardinality = 2 >>> model = ModelClass(polarities, cardinality)
- fit(L, class_balances=[])
Fit the Voter model. This computes the weights for each class.
Reweighing the votes help especially when specific classes have a high overlap in their weak labels.
The weights are computed so the weighted sum of votes over training matches the given class balance.
This guarantees mean(y_pred_proba_train) = class_balance.
For majority voting, do not use fit.
- Parameters:
L (pd.DataFrame) – Weak label dataframe.
class_balances (ListLike) –
Numpy array of shape cardinality giving a weight to each class.
When unspecified, assumes all classes are equally likely.
- Return type:
None
Example
>>> L = [[1, 0, 1, 2], [0, 1, 2, 1], [1, 2, 1, 0], [0, 1, 0, 2]] >>> class_balances = [0.6, 0.4] >>> base_model.fit(L, class_balances)
- predict_proba(L)
Predict probabilites using weighted voting.
- Parameters:
L (pd.DataFrame) – Weak label dataframe.
- Return type:
Array of predicted probabilities of shape (len(L), cardinality)
Example
>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]] >>> proba = snorkel_model.predict_proba(L) >>> # proba.shape = (len(L), cardinality)
- classmethod load(filepath)
Load a saved model from a file.
- Parameters:
filepath (str) – Path to the file containing the saved model.
- Return type:
The loaded model object.
Example
>>> model = ModelClass.load("label_model.pkl")
- predict(L, strategy='majority')
Predict labels for the given weak label matrix using the specified strategy.
- Parameters:
L (ndarray | DataFrame) –
Weak label dataframe.
Shape: (n_samples, n_weak)
strategy (str) –
Prediction strategy to use. Supported values: majority, probability.
Controls how labels are predicted from the predicted probabilites.
majority: Predict the label with the highest number of votes.
probability: Predict label j with probability proba[i, j].
This can be useful to enforce specific class_balances in the predictions.
Default is “majority”.
If there are no votes for a sample, will predict -1.
- Return type:
1-D array of predicted labels of size n_samples
Example
>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]] >>> predictions = base_model.predict(L) >>> # predictions.shape = (len(L),)
- save(filepath)
Save the model to a file.
- Parameters:
filepath (str) – Path to the file where the model will be saved.
Example
>>> model.save("label_model.pkl")
- class flippers.models.SnorkelModel(*args, **kwargs)
Bases:
Module,_ModelA label model implementation for weak supervision based on a generative approach.
This implementation is based on the Snorkel library’s label model.
Like its snorkel library counterpart assumes that the labeling functions are independent conditionally to Y, similar to a naive Bayes assumption.
However, good results can also be observed in practice for correlated LFs.
See the following link[] for more information on how to use this model and a comparison with the Snorkel library’s implementation.
Example
>>> from flippers.models import SnorkelModel >>> >>> # Create a SnorkelModel instance >>> label_model = SnorkelModel(polarities, cardinality, class_balances) >>> # Train the model >>> label_model.fit(L_train) >>> # Generate labels >>> y_pred_proba = label_model.predict_proba(L) # shape: (len(L), n_classes) >>> y_pred = label_model.predict(L) # shape: (len(L),)
Initializes a SnorkelModel instance with the given configuration options.
- Parameters:
polarities (ndarray | List) – List that maps weak labels to polarities, size n_weak.
cardinality (int, optional) –
Number of possible label values.
If unspecified, it will be inferred from the maximum value in polarities.
class_balances (ListLike, optional) –
List specifying class balance prior for each possible class, size n_classes.
Defaults to balanced classes prior.
Example
>>> polarities = [1, 0, 1, 1] >>> cardinality = 2 >>> class_balances = [0.7, 0.3] >>> label_model = SnorkelModel(polarities, cardinality, class_balances) >>> # Change device >>> label_model.to("cuda")
- property device
- fit(L, learning_rate=0.001, num_epochs=50, prec_init=0.5, k=0, verbose=False, **_)
Train the Snorkel model on the given weak label matrix L.
- Parameters:
L (MatrixLike) – Weak Label matrix of shape (num_samples, n_weak)
learning_rate (float, optional, default: 1e-3) – Learning rate for the optimizer.
num_epochs (int, optional, default: 50) – Number of epochs to train the model
prec_init (float, optional, default: 0.5) –
Initial value for precision
Can be of shape (n_weak) to set precision for each LF.
k (float, optional, default: 0) –
Weight of class blance loss term.
This term penalizes the model for predicting a class on the train set differently to its specified balance
verbose (bool, optional, default: False) – When True, displays training progress using tqdm.
- Return type:
None
Example
>>> L = [ ... [1, 0, 1, 1], ... [0, 1, 0, 1], ... [1, 0, 1, 0] ... ] >>> label_model.fit( ... L, ... learning_rate=1e-2, ... num_epochs=10, ... prec_init=0.7, ... k=5e-3, ... verbose=True ... )
- predict_proba(L, ignore_abstains=False)
Predicts the probabilities of the classes by updating the prior using the learned parameter mu as posteriors.
- Parameters:
L (MatrixLike) – Weak Label matrix
ignore_abstains (bool, optional) –
Whether to ignore abstains in the prior update:
$ When False (default), uses both votes and abstains. This is recommended and helps leverage information gained from knowing which labeling function abstained.
$ When True, updates prior only using non abstained.
- Returns:
An array of predicted probabilities of shape (num_samples, num_classes).
- Return type:
numpy.ndarray
Example
>>> L = [ ... [1, 0, 1, 1], ... [0, 1, 0, 1], ... [1, 0, 1, 0] ... ] >>> proba = label_model.predict_proba(L) >>> # proba.shape = (len(L), cardinality)
- classmethod load(filepath)
Load a saved model from a file.
- Parameters:
filepath (str) – Path to the file containing the saved model.
- Return type:
The loaded model object.
Example
>>> model = ModelClass.load("label_model.pkl")
- predict(L, strategy='majority')
Predict labels for the given weak label matrix using the specified strategy.
- Parameters:
L (ndarray | DataFrame) –
Weak label dataframe.
Shape: (n_samples, n_weak)
strategy (str) –
Prediction strategy to use. Supported values: majority, probability.
Controls how labels are predicted from the predicted probabilites.
majority: Predict the label with the highest number of votes.
probability: Predict label j with probability proba[i, j].
This can be useful to enforce specific class_balances in the predictions.
Default is “majority”.
If there are no votes for a sample, will predict -1.
- Return type:
1-D array of predicted labels of size n_samples
Example
>>> L = [[1, 0, 1, 2], [0, 1, 0, 0]] >>> predictions = base_model.predict(L) >>> # predictions.shape = (len(L),)
- save(filepath)
Save the model to a file.
- Parameters:
filepath (str) – Path to the file where the model will be saved.
Example
>>> model.save("label_model.pkl")