utils.data.sft
utils.data.sft
Data handling specific to SFT.
Functions
| Name | Description |
|---|---|
| prepare_datasets | Prepare training and evaluation datasets based on configuration. |
prepare_datasets
utils.data.sft.prepare_datasets(
cfg,
tokenizer,
processor=None,
preprocess_iterable=False,
)Prepare training and evaluation datasets based on configuration.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| cfg | DictDefault | Dictionary mapping axolotl config keys to values. |
required |
| tokenizer | PreTrainedTokenizer | Tokenizer to use for processing text. | required |
| processor | ProcessorMixin | None | Optional processor for multimodal datasets. | None |
| preprocess_iterable | bool | Whether to use iterable preprocessing. | False |
Returns
| Name | Type | Description |
|---|---|---|
| tuple[IterableDataset | Dataset, Dataset | None, int, list[Prompter | None]] | Tuple of (train_dataset, eval_dataset, total_steps, prompters). |