* docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
4.5 KiB
4.5 KiB
Axolotl
Fine-tuning framework for LLMs. Config-driven: every training run is defined by a single YAML file.
Tech Stack
Python, PyTorch, HuggingFace Transformers, TRL, PEFT (LoRA/QLoRA), DeepSpeed, FSDP, vLLM (for GRPO generation).
Commands
axolotl train config.yaml # Train (single or multi-GPU, auto-detected)
axolotl preprocess config.yaml # Tokenize dataset and validate config
axolotl preprocess config.yaml --debug # Inspect tokenized samples and label masking
axolotl inference config.yaml # Interactive inference
axolotl merge-lora config.yaml # Merge LoRA adapter into base model
axolotl vllm-serve config.yaml # Start vLLM server for GRPO/EBFT training
axolotl fetch examples # Download example configs
Training Methods
| Method | Config Key | When to Use |
|---|---|---|
| SFT | (default) | Input-output pairs, instruction tuning |
| DPO/IPO | rl: dpo / rl: ipo |
Paired preference data (chosen vs rejected) |
| KTO | rl: kto |
Unpaired binary preference labels |
| ORPO | rl: orpo |
Single-stage alignment, no ref model |
| GRPO | rl: grpo |
RL with verifiable reward functions (math, code) |
| EBFT | rl: ebft |
Feature-matching rewards from internal representations |
Agent-specific references:
- docs/agents/sft.md — supervised fine-tuning
- docs/agents/preference_tuning.md — DPO, IPO, KTO, ORPO, SimPO
- docs/agents/grpo.md — GRPO online RL with reward functions
- docs/agents/reward_modelling.md — outcome and process reward models
- docs/agents/pretraining.md — continual pretraining
Config Pattern
All training is config-driven. A YAML file specifies model, adapter, dataset(s), and hyperparameters:
base_model: meta-llama/Llama-3.1-8B-Instruct
adapter: lora # or qlora, or omit for full fine-tune
datasets:
- path: my_dataset
type: chat_template # prompt strategy (see docs/dataset-formats/)
output_dir: ./outputs/lora-out
Config schema: src/axolotl/utils/schemas/config.py (AxolotlInputConfig).
Project Structure
src/axolotl/
cli/ # CLI entry points (train, preprocess, inference, merge_lora, vllm_serve)
core/
builders/ # TrainerBuilder classes (causal.py for SFT, rl.py for RLHF)
trainers/ # Trainer classes, mixins (optimizer, scheduler, packing)
dpo/ # DPO trainer and config
grpo/ # GRPO trainer and sampler
loaders/ # Model, tokenizer, adapter, processor loading
prompt_strategies/ # Dataset format handlers (chat_template, alpaca, dpo/, kto/, orpo/)
utils/schemas/ # Pydantic config schemas (config, model, training, peft, trl, fsdp)
integrations/ # Plugins (liger, cut_cross_entropy, swanlab, nemo_gym)
monkeypatch/ # Runtime patches for HF transformers
examples/ # Example YAML configs by model (llama-3/, qwen2/, mistral/, ebft/)
deepspeed_configs/ # DeepSpeed JSON configs (zero2, zero3)
docs/ # Quarto documentation site
Code Conventions
- Config-driven: features are toggled via YAML, not code changes
- Prompt strategies:
src/axolotl/prompt_strategies/— eachtype:value maps to a function - Plugin system:
plugins:list in config loads integration modules - Trainer mixins:
core/trainers/mixins/for composable trainer behaviors - Schemas: all config validation via Pydantic in
utils/schemas/
Key Documentation
- Getting Started — quickstart tutorial
- Choosing a Method — SFT vs DPO vs GRPO decision guide
- Config Reference — all config options
- Dataset Formats — chat_template, alpaca, input_output, completion
- RLHF — DPO, KTO, ORPO, GRPO, EBFT configs and dataset formats
- GRPO Deep Dive — async training, custom rewards, scaling
- vLLM Serving — vLLM setup for GRPO/EBFT
- Multi-GPU — FSDP and DeepSpeed
- Training Stability — debugging loss, NaN, OOM
- Debugging — VSCode setup, Docker debugging