Model Main Functions and Controller
model_init_classification
model_init_classification (model_class, cpoint_path, output_hidden_states:bool, device=None, config=None, seed=None, body_model=None, model_kwargs={})
*To initialize a classification (or regression) model, either from an existing HuggingFace model or custom architecture
Can be used for binary, multi-class single-head, multi-class multi-head, multi-label clasisifcation, and regression*
| Type | Default | Details | |
|---|---|---|---|
| model_class | Model’s class object, e.g. RobertaHiddenStateConcatForSequenceClassification | ||
| cpoint_path | Either model string name on HuggingFace, or the path to model checkpoint | ||
| output_hidden_states | bool | To whether output the model hidden states or not. Useful when you try to build a custom classification head | |
| device | NoneType | None | Device to train on |
| config | NoneType | None | Model config. If not provided, AutoConfig is used to load config from cpoint_path |
| seed | NoneType | None | Random seed |
| body_model | NoneType | None | If not none, we use this to initialize model’s body. If you only want to load the model checkpoint in cpoint_path, leave this as none |
| model_kwargs | dict | {} | Keyword arguments for model (both head and body) |
compute_metrics
compute_metrics (pred, metric_funcs=[], metric_types=[], head_sizes=[], label_names=[], is_multilabel=False, multilabel_threshold=0.5)
*Return a dictionary of metric name and its values.
Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py#L107C16-L107C16*
| Type | Default | Details | |
|---|---|---|---|
| pred | An EvalPrediction object from HuggingFace (which is a named tuple with predictions and label_ids attributes) |
||
| metric_funcs | list | [] | A list of metric functions to evaluate |
| metric_types | list | [] | Type of metric (‘classification’ or ‘regression’) for each metric functions above |
| head_sizes | list | [] | Class size for each head. Regression head will have head size 1 |
| label_names | list | [] | Names of the label (dependent variable) columns |
| is_multilabel | bool | False | Whether this is a multilabel classification |
| multilabel_threshold | float | 0.5 | Threshold for multilabel (>= threshold is positive) |
compute_metrics_separate_heads
compute_metrics_separate_heads (pred, metric_funcs=[], label_names=[], **kwargs)
*Return a dictionary of metric name and its values. This is used in Deep Hierarchical Classification (special case of multi-head classification)
This metric function is mainly used when you have a separate logit output for each head (instead of the typical multi-head logit output: all heads’ logits are concatenated)*
| Type | Default | Details | |
|---|---|---|---|
| pred | An EvalPrediction object from HuggingFace (which is a named tuple with predictions and label_ids attributes) |
||
| metric_funcs | list | [] | A list of metric functions to evaluate |
| label_names | list | [] | Names of the label (dependent variable) columns |
| kwargs |
loss_for_classification
loss_for_classification (logits, labels, is_multilabel=False, is_multihead=False, head_sizes=[], head_weights=[])
*The general loss function for classification
If is_multilabel is
Falseand is_multihead isFalse: Single-Head Classification, e.g. You predict 1 out of n classIf is_multilabel is
Falseand is_multihead isTrue: Multi-Head Classification, e.g. You predict 1 out of n classes at Level 1, and 1 out of m classes at Level 2If is_multilabel is
Trueand is_multihead isFalse: Single-Head Multi-Label Classification, e.g. You predict x out of n class (x>=0)If is_multilabel is
Trueand is_multihead isTrue: Not supported*
| Type | Default | Details | |
|---|---|---|---|
| logits | output of the last linear layer, before any softmax/sigmoid. Size: (bs,class_size) | ||
| labels | determined by your datasetdict. Size: (bs,number_of_head) | ||
| is_multilabel | bool | False | Whether this is a multilabel classification |
| is_multihead | bool | False | Whether this is a multihead classification |
| head_sizes | list | [] | Class size for each head. Regression head will have head size 1 |
| head_weights | list | [] | loss weight for each head. Default to 1 for each head |
finetune
finetune (lr, bs, wd, epochs, ddict, tokenizer, o_dir='./tmp_weights', save_checkpoint=False, model=None, model_init=None, data_collator=None, compute_metrics=None, grad_accum_steps=2, lr_scheduler_type='cosine', warmup_ratio=0.1, no_valid=False, val_bs=None, seed=None, report_to='none', trainer_class=None, len_train=None)
The main model training/finetuning function
| Type | Default | Details | |
|---|---|---|---|
| lr | Learning rate | ||
| bs | Batch size | ||
| wd | Weight decay | ||
| epochs | Number of epochs | ||
| ddict | The HuggingFace datasetdict | ||
| tokenizer | HuggingFace tokenizer | ||
| o_dir | str | ./tmp_weights | Directory to save weights |
| save_checkpoint | bool | False | Whether to save weights (checkpoints) to o_dir |
| model | NoneType | None | NLP model |
| model_init | NoneType | None | A function to initialize model |
| data_collator | NoneType | None | HuggingFace data collator |
| compute_metrics | NoneType | None | A function to compute metric, e.g. compute_metrics |
| grad_accum_steps | int | 2 | The batch at each step will be divided by this integer and gradient will be accumulated over gradient_accumulation_steps steps. |
| lr_scheduler_type | str | cosine | The scheduler type to use. Including: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup |
| warmup_ratio | float | 0.1 | The warmup ratio for some lr scheduler |
| no_valid | bool | False | Whether there is a validation set or not |
| val_bs | NoneType | None | Validation batch size |
| seed | NoneType | None | Random seed |
| report_to | str | none | The list of integrations to report the results and logs to. Supported platforms are “azure_ml”, “comet_ml”, “mlflow”, “neptune”, “tensorboard”,“clearml” and “wandb”. Use “all” to report to all integrations installed, “none” for no integrations. |
| trainer_class | NoneType | None | You can include the class name of your custom trainer here |
| len_train | NoneType | None | estimated number of samples in the whole training set (for streaming dataset only) |
ModelController
ModelController (model, data_store=None, seed=None)
Initialize self. See help(type(self)) for accurate signature.
| Type | Default | Details | |
|---|---|---|---|
| model | NLP model | ||
| data_store | NoneType | None | a TextDataController/TextDataControllerStreaming object |
| seed | NoneType | None | Random seed |
ModelController.fit
ModelController.fit (epochs, learning_rate, ddict=None, metric_funcs=[<function accuracy_score at 0x7f896fe39820>], metric_types=[], batch_size=16, val_batch_size=None, weight_decay=0.01, lr_scheduler_type='cosine', warmup_ratio=0.1, o_dir='./tmp_weights', save_checkpoint=False, hf_report_to='none', compute_metrics=<function compute_metrics>, grad_accum_steps=2, tokenizer=None, label_names=None, head_sizes=None, trainer_class=None, len_train=None)
| Type | Default | Details | |
|---|---|---|---|
| epochs | Number of epochs | ||
| learning_rate | Learning rate | ||
| ddict | NoneType | None | DatasetDict to fit (will override data_store) |
| metric_funcs | list | [<function accuracy_score at 0x7f896fe39820>] | A list of metric functions (can be from Sklearn) |
| metric_types | list | [] | A list of metric types (classification or regression) that matches with the metric function list |
| batch_size | int | 16 | Batch size |
| val_batch_size | NoneType | None | Validation batch size. Set to batch_size if None |
| weight_decay | float | 0.01 | Weight decay |
| lr_scheduler_type | str | cosine | The scheduler type to use. Including: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup |
| warmup_ratio | float | 0.1 | The warmup ratio for some lr scheduler |
| o_dir | str | ./tmp_weights | Directory to save weights |
| save_checkpoint | bool | False | Whether to save weights (checkpoints) to o_dir |
| hf_report_to | str | none | The list of HuggingFace-allowed integrations to report the results and logs to |
| compute_metrics | function | compute_metrics | A function to compute metric, e.g. compute_metrics which utilizes the given metric_funcs |
| grad_accum_steps | int | 2 | Gradient will be accumulated over gradient_accumulation_steps steps. |
| tokenizer | NoneType | None | Tokenizer (to override one in data_store) |
| label_names | NoneType | None | Names of the label (dependent variable) columns (to override one in data_store) |
| head_sizes | NoneType | None | Class size for each head (to override one in model) |
| trainer_class | NoneType | None | You can include the class name of your custom trainer here |
| len_train | NoneType | None | Number of samples in the whole training set (for streaming dataset only) |
ModelController.predict_raw_text
ModelController.predict_raw_text (content:Union[dict,list,str], is_multilabel=None, multilabel_threshold=0.5, topk=1, are_heads_separated=False)
ModelController.predict_raw_dset
ModelController.predict_raw_dset (dset, batch_size=16, do_filtering=False, is_multilabel=None, multilabel_threshold=0.5, topk=1, are_heads_separated=False)
ModelController.predict_ddict
ModelController.predict_ddict (ddict:Union[datasets.dataset_dict.DatasetD ict,datasets.arrow_dataset.Dataset]=None, ds_type='test', batch_size=16, is_multilabel=None, multilabel_threshold=0.5, topk=1, tokenizer=None, label_names=None, class_names_predefined=None, are_heads_separated=False)
| Type | Default | Details | |
|---|---|---|---|
| ddict | DatasetDict | Dataset | None | A processed and tokenized DatasetDict/Dataset (will override one in data_store) |
| ds_type | str | test | The split of DatasetDict to predict |
| batch_size | int | 16 | Batch size for making prediction on GPU |
| is_multilabel | NoneType | None | Is this a multilabel classification? |
| multilabel_threshold | float | 0.5 | Threshold for multilabel classification |
| topk | int | 1 | Number of labels to return for each head |
| tokenizer | NoneType | None | Tokenizer (to override one in data_store) |
| label_names | NoneType | None | Names of the label (dependent variable) columns (to override one in data_store) |
| class_names_predefined | NoneType | None | List of names associated with the labels (same index order) (to override one in data_store) |
| are_heads_separated | bool | False | Are outputs (of model) separate heads? |