Dataset Formats
Supported dataset formats.
Axolotl supports a variety of dataset formats. It is recommended to use a JSONL format. The schema of the JSONL depends upon the task and the prompt template you wish to use. Instead of a JSONL, you can also use a HuggingFace dataset with columns for each JSONL field.
Below are these various formats organized by task:
| Title | Description |
|---|---|
| Pre-training | Data format for a pre-training completion task. |
| Instruction Tuning | Instruction tuning formats for supervised fine-tuning. |
| Conversation | Conversation format for supervised fine-tuning. |
| Stepwise Supervised Format | Format for datasets with stepwise completions and labels |
| Template-Free | Construct prompts without a template. |
| Custom Pre-Tokenized Dataset | How to use a custom pre-tokenized dataset. |
No matching items