* docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
196 lines
5.0 KiB
Plaintext
196 lines
5.0 KiB
Plaintext
---
|
|
title: "Quickstart"
|
|
format:
|
|
html:
|
|
toc: true
|
|
toc-depth: 3
|
|
number-sections: true
|
|
execute:
|
|
enabled: false
|
|
---
|
|
|
|
This guide will walk you through your first model fine-tuning project with Axolotl.
|
|
|
|
## Quick Example {#sec-quick-example}
|
|
|
|
Let's start by fine-tuning a small language model using LoRA. This example uses a 1B parameter model to ensure it runs on most GPUs.
|
|
Assuming `axolotl` is installed (if not, see our [Installation Guide](installation.qmd))
|
|
|
|
1. Download example configs:
|
|
```bash
|
|
axolotl fetch examples
|
|
```
|
|
|
|
2. Run the training:
|
|
```bash
|
|
axolotl train examples/llama-3/lora-1b.yml
|
|
```
|
|
|
|
That's it! Let's understand what just happened.
|
|
|
|
## Understanding the Process {#sec-understanding}
|
|
|
|
### The Configuration File {#sec-config}
|
|
|
|
The YAML configuration file controls everything about your training. Here's what (part of) our example config looks like:
|
|
|
|
```yaml
|
|
base_model: NousResearch/Llama-3.2-1B
|
|
|
|
load_in_8bit: true
|
|
adapter: lora
|
|
|
|
datasets:
|
|
- path: teknium/GPT4-LLM-Cleaned
|
|
type: alpaca
|
|
dataset_prepared_path: last_run_prepared
|
|
val_set_size: 0.1
|
|
output_dir: ./outputs/lora-out
|
|
```
|
|
|
|
::: {.callout-tip}
|
|
`load_in_8bit: true` and `adapter: lora` enables LoRA adapter finetuning.
|
|
|
|
- To perform Full finetuning, remove these two lines.
|
|
- To perform QLoRA finetuning, replace with `load_in_4bit: true` and `adapter: qlora`.
|
|
:::
|
|
|
|
See our [config options](config-reference.qmd) for more details.
|
|
|
|
### Training {#sec-training}
|
|
|
|
When you run `axolotl train`, Axolotl:
|
|
|
|
1. Downloads the base model
|
|
2. (If specified) applies QLoRA/LoRA adapter layers
|
|
3. Loads and processes the dataset
|
|
4. Runs the training loop
|
|
5. Saves the trained model and / or LoRA weights
|
|
|
|
## Your First Custom Training {#sec-custom}
|
|
|
|
Let's modify the example for your own data:
|
|
|
|
1. Create a new config file `my_training.yml`:
|
|
|
|
```yaml
|
|
base_model: NousResearch/Nous-Hermes-llama-1b-v1
|
|
|
|
load_in_8bit: true
|
|
adapter: lora
|
|
|
|
# Training settings
|
|
micro_batch_size: 2
|
|
num_epochs: 3
|
|
learning_rate: 0.0003
|
|
|
|
# Your dataset
|
|
datasets:
|
|
- path: my_data.jsonl # Your local data file
|
|
type: alpaca # Or other format
|
|
```
|
|
|
|
This specific config is for LoRA fine-tuning a model with instruction tuning data using
|
|
the `alpaca` dataset format, which has the following format:
|
|
|
|
```json
|
|
{
|
|
"instruction": "Write a description of alpacas.",
|
|
"input": "",
|
|
"output": "Alpacas are domesticated South American camelids..."
|
|
}
|
|
```
|
|
|
|
Please see our [Dataset Formats](dataset-formats) for more dataset formats and how to
|
|
format them.
|
|
|
|
2. Prepare your JSONL data in the specified format (in this case, the expected `alpaca`
|
|
format):
|
|
|
|
```json
|
|
{"instruction": "Classify this text", "input": "I love this!", "output": "positive"}
|
|
{"instruction": "Classify this text", "input": "Not good at all", "output": "negative"}
|
|
```
|
|
|
|
3. Run the training:
|
|
|
|
```bash
|
|
axolotl train my_training.yml
|
|
```
|
|
|
|
## Common Tasks {#sec-common-tasks}
|
|
|
|
::: {.callout-tip}
|
|
|
|
The same yaml file is used for training, inference, and merging.
|
|
|
|
:::
|
|
|
|
### Testing Your Model {#sec-testing}
|
|
|
|
After training, test your model:
|
|
|
|
```bash
|
|
axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out"
|
|
```
|
|
|
|
More details can be found in [Inference](inference.qmd).
|
|
|
|
### Using a UI {#sec-ui}
|
|
|
|
Launch a Gradio interface:
|
|
|
|
```bash
|
|
axolotl inference my_training.yml --lora-model-dir="./outputs/lora-out" --gradio
|
|
```
|
|
|
|
### Preprocessing Data {#sec-preprocessing}
|
|
|
|
For large datasets, preprocess first:
|
|
|
|
```bash
|
|
axolotl preprocess my_training.yml
|
|
```
|
|
|
|
Please make sure to set `dataset_prepared_path: ` in your config to set the path to save the prepared dataset.
|
|
|
|
More details can be found in [Dataset Preprocessing](dataset_preprocessing.qmd).
|
|
|
|
### Merging LoRA weights {#sec-merging-lora}
|
|
|
|
To merge the LoRA weights back into the base model, run:
|
|
|
|
```bash
|
|
axolotl merge-lora my_training.yml --lora-model-dir="./outputs/lora-out"
|
|
```
|
|
|
|
The merged model will be saved in the `{output_dir}/merged` directory.
|
|
|
|
More details can be found in [Merging LoRA weights](inference.qmd#sec-merging).
|
|
|
|
## Next Steps {#sec-next-steps}
|
|
|
|
Now that you have the basics, explore these guides based on what you want to do:
|
|
|
|
**Choose your path:**
|
|
|
|
- [Choosing a Fine-Tuning Method](choosing_method.qmd) — SFT vs LoRA vs QLoRA vs GRPO vs DPO, with hardware recommendations
|
|
|
|
**Core guides:**
|
|
|
|
- [Dataset Loading](dataset_loading.qmd) — Loading datasets from various sources
|
|
- [Dataset Formats](dataset-formats) — Working with different data formats
|
|
- [Optimizations](optimizations.qmd) — Flash attention, gradient checkpointing, sample packing
|
|
- [Training Stability & Debugging](training_stability.qmd) — Monitoring metrics, fixing NaN, OOM debugging
|
|
|
|
**Advanced training methods:**
|
|
|
|
- [RLHF / Preference Learning](rlhf.qmd) — DPO, KTO, GRPO, EBFT
|
|
- [GRPO Training](grpo.qmd) — RL with custom rewards and vLLM generation
|
|
- [vLLM Serving](vllm_serving.qmd) — Setting up vLLM for GRPO
|
|
|
|
**Scaling up:**
|
|
|
|
- [Multi-GPU Training](multi-gpu.qmd) — DeepSpeed, FSDP, DDP
|
|
- [Multi-Node Training](multi-node.qmd) — Distributed training across machines
|