Model Main Functions and Controller

For an in-depth tutorial, click here for classification, or here for regression

model_init_classification

 model_init_classification (model_class, cpoint_path,
                            output_hidden_states:bool, device=None,
                            config=None, seed=None, body_model=None,
                            model_kwargs={})

*To initialize a classification (or regression) model, either from an existing HuggingFace model or custom architecture

Can be used for binary, multi-class single-head, multi-class multi-head, multi-label clasisifcation, and regression*

	Type	Default	Details
model_class			Model’s class object, e.g. RobertaHiddenStateConcatForSequenceClassification
cpoint_path			Either model string name on HuggingFace, or the path to model checkpoint
output_hidden_states	bool		To whether output the model hidden states or not. Useful when you try to build a custom classification head
device	NoneType	None	Device to train on
config	NoneType	None	Model config. If not provided, AutoConfig is used to load config from cpoint_path
seed	NoneType	None	Random seed
body_model	NoneType	None	If not none, we use this to initialize model’s body. If you only want to load the model checkpoint in cpoint_path, leave this as none
model_kwargs	dict	{}	Keyword arguments for model (both head and body)

source

compute_metrics

 compute_metrics (pred, metric_funcs=[], metric_types=[], head_sizes=[],
                  label_names=[], is_multilabel=False,
                  multilabel_threshold=0.5)

*Return a dictionary of metric name and its values.

Reference: https://github.com/huggingface/transformers/blob/main/src/transformers/trainer_utils.py#L107C16-L107C16*

	Type	Default	Details
pred			An EvalPrediction object from HuggingFace (which is a named tuple with `predictions` and `label_ids` attributes)
metric_funcs	list	[]	A list of metric functions to evaluate
metric_types	list	[]	Type of metric (‘classification’ or ‘regression’) for each metric functions above
head_sizes	list	[]	Class size for each head. Regression head will have head size 1
label_names	list	[]	Names of the label (dependent variable) columns
is_multilabel	bool	False	Whether this is a multilabel classification
multilabel_threshold	float	0.5	Threshold for multilabel (>= threshold is positive)

source

compute_metrics_separate_heads

 compute_metrics_separate_heads (pred, metric_funcs=[], label_names=[],
                                 **kwargs)

*Return a dictionary of metric name and its values. This is used in Deep Hierarchical Classification (special case of multi-head classification)

This metric function is mainly used when you have a separate logit output for each head (instead of the typical multi-head logit output: all heads’ logits are concatenated)*

	Type	Default	Details
pred			An EvalPrediction object from HuggingFace (which is a named tuple with `predictions` and `label_ids` attributes)
metric_funcs	list	[]	A list of metric functions to evaluate
label_names	list	[]	Names of the label (dependent variable) columns
kwargs

source

loss_for_classification

 loss_for_classification (logits, labels, is_multilabel=False,
                          is_multihead=False, head_sizes=[],
                          head_weights=[])

*The general loss function for classification

If is_multilabel is False and is_multihead is False: Single-Head Classification, e.g. You predict 1 out of n class
If is_multilabel is False and is_multihead is True: Multi-Head Classification, e.g. You predict 1 out of n classes at Level 1, and 1 out of m classes at Level 2
If is_multilabel is True and is_multihead is False: Single-Head Multi-Label Classification, e.g. You predict x out of n class (x>=0)
If is_multilabel is True and is_multihead is True: Not supported*

	Type	Default	Details
logits			output of the last linear layer, before any softmax/sigmoid. Size: (bs,class_size)
labels			determined by your datasetdict. Size: (bs,number_of_head)
is_multilabel	bool	False	Whether this is a multilabel classification
is_multihead	bool	False	Whether this is a multihead classification
head_sizes	list	[]	Class size for each head. Regression head will have head size 1
head_weights	list	[]	loss weight for each head. Default to 1 for each head

source

finetune

 finetune (lr, bs, wd, epochs, ddict, tokenizer, o_dir='./tmp_weights',
           save_checkpoint=False, model=None, model_init=None,
           data_collator=None, compute_metrics=None, grad_accum_steps=2,
           lr_scheduler_type='cosine', warmup_ratio=0.1, no_valid=False,
           val_bs=None, seed=None, report_to='none', trainer_class=None,
           len_train=None)

The main model training/finetuning function

	Type	Default	Details
lr			Learning rate
bs			Batch size
wd			Weight decay
epochs			Number of epochs
ddict			The HuggingFace datasetdict
tokenizer			HuggingFace tokenizer
o_dir	str	./tmp_weights	Directory to save weights
save_checkpoint	bool	False	Whether to save weights (checkpoints) to o_dir
model	NoneType	None	NLP model
model_init	NoneType	None	A function to initialize model
data_collator	NoneType	None	HuggingFace data collator
compute_metrics	NoneType	None	A function to compute metric, e.g. `compute_metrics`
grad_accum_steps	int	2	The batch at each step will be divided by this integer and gradient will be accumulated over gradient_accumulation_steps steps.
lr_scheduler_type	str	cosine	The scheduler type to use. Including: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup
warmup_ratio	float	0.1	The warmup ratio for some lr scheduler
no_valid	bool	False	Whether there is a validation set or not
val_bs	NoneType	None	Validation batch size
seed	NoneType	None	Random seed
report_to	str	none	The list of integrations to report the results and logs to. Supported platforms are “azure_ml”, “comet_ml”, “mlflow”, “neptune”, “tensorboard”,“clearml” and “wandb”. Use “all” to report to all integrations installed, “none” for no integrations.
trainer_class	NoneType	None	You can include the class name of your custom trainer here
len_train	NoneType	None	estimated number of samples in the whole training set (for streaming dataset only)

source

ModelController

 ModelController (model, data_store=None, seed=None)

Initialize self. See help(type(self)) for accurate signature.

	Type	Default	Details
model			NLP model
data_store	NoneType	None	a TextDataController/TextDataControllerStreaming object
seed	NoneType	None	Random seed

source

ModelController.fit

 ModelController.fit (epochs, learning_rate, ddict=None,
                      metric_funcs=[<function accuracy_score at
                      0x7f896fe39820>], metric_types=[], batch_size=16,
                      val_batch_size=None, weight_decay=0.01,
                      lr_scheduler_type='cosine', warmup_ratio=0.1,
                      o_dir='./tmp_weights', save_checkpoint=False,
                      hf_report_to='none', compute_metrics=<function
                      compute_metrics>, grad_accum_steps=2,
                      tokenizer=None, label_names=None, head_sizes=None,
                      trainer_class=None, len_train=None)

	Type	Default	Details
epochs			Number of epochs
learning_rate			Learning rate
ddict	NoneType	None	DatasetDict to fit (will override data_store)
metric_funcs	list	[<function accuracy_score at 0x7f896fe39820>]	A list of metric functions (can be from Sklearn)
metric_types	list	[]	A list of metric types (`classification` or `regression`) that matches with the metric function list
batch_size	int	16	Batch size
val_batch_size	NoneType	None	Validation batch size. Set to batch_size if None
weight_decay	float	0.01	Weight decay
lr_scheduler_type	str	cosine	The scheduler type to use. Including: linear, cosine, cosine_with_restarts, polynomial, constant, constant_with_warmup
warmup_ratio	float	0.1	The warmup ratio for some lr scheduler
o_dir	str	./tmp_weights	Directory to save weights
save_checkpoint	bool	False	Whether to save weights (checkpoints) to o_dir
hf_report_to	str	none	The list of HuggingFace-allowed integrations to report the results and logs to
compute_metrics	function	compute_metrics	A function to compute metric, e.g. `compute_metrics` which utilizes the given `metric_funcs`
grad_accum_steps	int	2	Gradient will be accumulated over gradient_accumulation_steps steps.
tokenizer	NoneType	None	Tokenizer (to override one in `data_store`)
label_names	NoneType	None	Names of the label (dependent variable) columns (to override one in `data_store`)
head_sizes	NoneType	None	Class size for each head (to override one in `model`)
trainer_class	NoneType	None	You can include the class name of your custom trainer here
len_train	NoneType	None	Number of samples in the whole training set (for streaming dataset only)

source

ModelController.predict_raw_text

 ModelController.predict_raw_text (content:Union[dict,list,str],
                                   is_multilabel=None,
                                   multilabel_threshold=0.5, topk=1,
                                   are_heads_separated=False)

source

ModelController.predict_raw_dset

 ModelController.predict_raw_dset (dset, batch_size=16,
                                   do_filtering=False, is_multilabel=None,
                                   multilabel_threshold=0.5, topk=1,
                                   are_heads_separated=False)

source

ModelController.predict_ddict

 ModelController.predict_ddict (ddict:Union[datasets.dataset_dict.DatasetD
                                ict,datasets.arrow_dataset.Dataset]=None,
                                ds_type='test', batch_size=16,
                                is_multilabel=None,
                                multilabel_threshold=0.5, topk=1,
                                tokenizer=None, label_names=None,
                                class_names_predefined=None,
                                are_heads_separated=False)

	Type	Default	Details
ddict	DatasetDict \| Dataset	None	A processed and tokenized DatasetDict/Dataset (will override one in `data_store`)
ds_type	str	test	The split of DatasetDict to predict
batch_size	int	16	Batch size for making prediction on GPU
is_multilabel	NoneType	None	Is this a multilabel classification?
multilabel_threshold	float	0.5	Threshold for multilabel classification
topk	int	1	Number of labels to return for each head
tokenizer	NoneType	None	Tokenizer (to override one in `data_store`)
label_names	NoneType	None	Names of the label (dependent variable) columns (to override one in `data_store`)
class_names_predefined	NoneType	None	List of names associated with the labels (same index order) (to override one in `data_store`)
are_heads_separated	bool	False	Are outputs (of model) separate heads?