feat(docs): comprehensive improvement (#3564)
* docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
This commit is contained in:
71
docs/agents/grpo.md
Normal file
71
docs/agents/grpo.md
Normal file
@@ -0,0 +1,71 @@
|
||||
# GRPO — Agent Reference
|
||||
|
||||
Online RL with verifiable reward functions. For full config reference, async features, and scaling, see [grpo.qmd](../grpo.qmd). For vLLM setup, see [vllm_serving.qmd](../vllm_serving.qmd).
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Terminal 1 (GPU 0) Terminal 2 (GPU 1)
|
||||
┌──────────────────────┐ ┌──────────────────────────────────┐
|
||||
│ vLLM Server │ HTTP │ Trainer │
|
||||
│ Serves base model │◄────────────►│ 1. Send prompts to vLLM │
|
||||
│ + LoRA adapter │ /generate │ 2. Score completions (rewards) │
|
||||
│ │ /set_lora │ 3. Compute advantages │
|
||||
│ Punica kernels for │ │ 4. PPO-clip gradient update │
|
||||
│ LoRA inference │ │ 5. Sync LoRA weights to vLLM │
|
||||
└──────────────────────┘ └──────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Components Required
|
||||
|
||||
1. A YAML config with `rl: grpo`
|
||||
2. A reward module (Python file with reward functions)
|
||||
3. A running vLLM server (`axolotl vllm-serve config.yaml`)
|
||||
|
||||
## Reward Function Signature
|
||||
|
||||
```python
|
||||
def my_reward(completions, **kwargs) -> list[float]:
|
||||
# completions[i][0]["content"] = text of i-th completion
|
||||
# **kwargs contains dataset columns not removed by transform
|
||||
return [score_for_each_completion]
|
||||
```
|
||||
|
||||
Multiple rewards: `reward_funcs: [r1, r2]` with `reward_weights: [1.0, 0.5]`.
|
||||
|
||||
## Key Async Features
|
||||
|
||||
| Feature | Config | Purpose |
|
||||
|---------|--------|---------|
|
||||
| Async prefetch | `async_prefetch: true` | Overlap generation with training |
|
||||
| LoRA sync | `vllm_lora_sync: true` | Fast adapter sync via filesystem |
|
||||
| Streaming scoring | `streaming_partial_batch: true` | Score one group at a time |
|
||||
| Zero-adv skip | `skip_zero_advantage_batches: true` | Skip batches with no learning signal |
|
||||
| Replay buffer | `replay_buffer_size: 100` | Cache high-signal groups |
|
||||
| IS correction | `vllm_importance_sampling_correction: true` | Fix off-policy distribution shift |
|
||||
|
||||
## Health Checks
|
||||
|
||||
- `rewards/*/mean` > 0.15 within 20 steps (else: test reward function standalone)
|
||||
- `reward_std` > 0 on most steps (else: no learning signal)
|
||||
- `entropy` 0.05-0.5 (< 0.01 = mode collapse)
|
||||
- `grad_norm` 0.001-1.0 (> 10 = unstable, 0.0 = zero-advantage skip)
|
||||
|
||||
See [training_stability.qmd](../training_stability.qmd) for detailed diagnostics.
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
src/axolotl/
|
||||
cli/train.py # Entry point
|
||||
cli/vllm_serve.py # Entry point for vLLM server
|
||||
core/trainers/grpo/
|
||||
trainer.py # AxolotlGRPOTrainer
|
||||
sampler.py # Sampling utilities
|
||||
core/builders/rl.py # HFRLTrainerBuilder — routes rl type → trainer
|
||||
scripts/vllm_serve_lora.py # vLLM serve script with LoRA sync support
|
||||
utils/schemas/trl.py # TRL config schema (all trl: options)
|
||||
|
||||
docs/grpo.qmd # Full user docs: async, rewards, scaling, config reference
|
||||
docs/vllm_serving.qmd # vLLM server modes, LoRA sync, weight sync
|
||||
```
|
||||
121
docs/agents/preference_tuning.md
Normal file
121
docs/agents/preference_tuning.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Preference Learning (RLHF) — Agent Reference
|
||||
|
||||
Reference for DPO, IPO, KTO, ORPO, and SimPO. For config templates and dataset format examples, see [rlhf.qmd](../rlhf.qmd). For GRPO, see [grpo.qmd](../grpo.qmd). For EBFT, see [ebft.qmd](../ebft.qmd).
|
||||
|
||||
## Method Overview
|
||||
|
||||
| Method | Data Requirement | Key Idea | Best For |
|
||||
|--------|-----------------|----------|----------|
|
||||
| **DPO** | Paired (chosen + rejected) | Implicit reward via preference pairs | General alignment, most common |
|
||||
| **IPO** | Paired (chosen + rejected) | DPO with different loss (avoids overfitting) | When DPO overfits |
|
||||
| **KTO** | Unpaired (completion + binary label) | Kahneman-Tversky loss, no pairs needed | When you only have thumbs-up/down |
|
||||
| **ORPO** | Paired (chosen + rejected) | Combined SFT + preference, no ref model | Single-stage alignment, saves VRAM |
|
||||
| **SimPO** | Paired (chosen + rejected) | Length-normalized, no ref model | Simple setup, length-robust |
|
||||
|
||||
Default: start with DPO. All methods require `sample_packing: false`.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
┌──────────────┐ ┌───────────────┐ ┌───────────────┐
|
||||
│ Policy Model │ │ Reference │ │ Preference │
|
||||
│ (trainable) │ │ Model (frozen)│ │ Dataset │
|
||||
└──────┬───────┘ └──────┬────────┘ └──────┬────────┘
|
||||
└──────────┬───────┘ │
|
||||
v │
|
||||
Forward pass on chosen + rejected <─────┘
|
||||
│
|
||||
Preference Loss (DPO/IPO/KTO/...)
|
||||
│
|
||||
Backprop + Update
|
||||
|
||||
Exception: ORPO and SimPO do NOT use a reference model (~50% less VRAM).
|
||||
```
|
||||
|
||||
No vLLM server needed (unlike GRPO). Offline RL with pre-collected preference data.
|
||||
|
||||
## Method Selection
|
||||
|
||||
1. Paired preference data (chosen + rejected)?
|
||||
- Default → `rl: dpo`
|
||||
- Overfitting → `rl: ipo`
|
||||
- VRAM-limited → `rl: orpo` (no ref model)
|
||||
- Length-sensitive → `rl: simpo` (no ref model)
|
||||
2. Only binary labels (good/bad)? → `rl: kto`
|
||||
3. Single-stage training (no separate SFT)? → `rl: orpo`
|
||||
|
||||
| | DPO | IPO | KTO | ORPO | SimPO |
|
||||
|---|---|---|---|---|---|
|
||||
| **Reference model** | Yes | Yes | Yes | No | No |
|
||||
| **VRAM overhead** | ~2x model | ~2x model | ~2x model | ~1x model | ~1x model |
|
||||
| **TRL trainer class** | DPOTrainer | DPOTrainer | KTOTrainer | ORPOTrainer | CPOTrainer |
|
||||
|
||||
## Prompt Strategy Resolution
|
||||
|
||||
The `type` field resolves to a Python function:
|
||||
|
||||
```
|
||||
type: "chatml.intel"
|
||||
→ axolotl.prompt_strategies.dpo.chatml.intel(cfg, **kwargs)
|
||||
→ returns transform_fn(sample) → {"prompt", "chosen", "rejected"}
|
||||
|
||||
type: "chat_template.default"
|
||||
→ axolotl.prompt_strategies.dpo.chat_template.default(cfg, dataset_idx, **kwargs)
|
||||
|
||||
type: {"field_prompt": "prompt", ...} (dict)
|
||||
→ axolotl.prompt_strategies.dpo.user_defined.default(...)
|
||||
```
|
||||
|
||||
Module base: `axolotl.prompt_strategies.{rl_method}` — replace `dpo` with `kto` or `orpo`.
|
||||
|
||||
## Healthy Training Indicators
|
||||
|
||||
| Metric | Healthy Range | Problem |
|
||||
|--------|--------------|---------|
|
||||
| `train/loss` | Decreasing, 0.3-0.7 | Flat or increasing = broken data or too high LR |
|
||||
| `rewards/chosen` | Increasing | Flat = model not learning preferences |
|
||||
| `rewards/rejected` | Decreasing | Increasing = model prefers wrong responses |
|
||||
| `rewards/margins` | Positive and increasing | Negative = prefers rejected over chosen |
|
||||
| `rewards/accuracies` | > 0.5, toward 0.7+ | < 0.5 = worse than random |
|
||||
| `logps/rejected` | Decreasing | Increasing = reward hacking |
|
||||
| `grad_norm` | 0.01 - 10.0 | > 100 = exploding gradients |
|
||||
|
||||
Method-specific: DPO/IPO watch `rewards/margins`; KTO loss is noisier; ORPO monitor SFT + odds ratio components; SimPO check length-normalized reward separation.
|
||||
|
||||
## Known Issues
|
||||
|
||||
| Issue | Fix |
|
||||
|-------|-----|
|
||||
| Sample packing crash | Set `sample_packing: false` (required for all preference methods) |
|
||||
| KTO `KeyError: 'label'` | Ensure dataset has boolean `label` column |
|
||||
| ORPO/KTO `KeyError` during tokenization | Add `remove_unused_columns: false` |
|
||||
| ORPO template not applied | ORPO requires explicit `chat_template` setting |
|
||||
| OOM with ref model (DPO/IPO/KTO) | Use LoRA/QLoRA, or switch to ORPO/SimPO (no ref model) |
|
||||
| IPO + label_smoothing | Do not set `dpo_label_smoothing` when `rl: ipo` |
|
||||
|
||||
Full troubleshooting: [training_stability.qmd](../training_stability.qmd)
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
src/axolotl/
|
||||
core/trainers/dpo/ # DPO trainer, args, strategy
|
||||
core/builders/rl.py # HFRLTrainerBuilder — routes rl type → trainer class
|
||||
core/training_args.py # AxolotlKTOConfig, AxolotlORPOConfig, AxolotlCPOConfig
|
||||
prompt_strategies/
|
||||
dpo/ # DPO/IPO/SimPO dataset strategies
|
||||
chat_template.py # chat_template.default, chat_template.argilla_chat
|
||||
chatml.py # chatml.default/intel/icr/argilla_chat/prompt_pairs/ultra
|
||||
llama3.py # llama3 variants (same subtypes as chatml)
|
||||
user_defined.py # Custom field mapping
|
||||
passthrough.py # No transform
|
||||
kto/ # KTO dataset strategies (chatml, llama3, user_defined)
|
||||
orpo/ # ORPO dataset strategies (chat_template.argilla)
|
||||
utils/schemas/enums.py # RLType enum (dpo, ipo, kto, orpo, simpo, grpo, gdpo, ebft)
|
||||
utils/schemas/config.py # All rl/dpo/kto/orpo/simpo config fields
|
||||
|
||||
docs/rlhf.qmd # Full user docs: all dataset formats, config templates
|
||||
docs/choosing_method.qmd # SFT vs DPO vs GRPO decision guide
|
||||
examples/qwen2/dpo.yaml # DPO example
|
||||
examples/llama-3/qlora-1b-kto.yaml # KTO example
|
||||
```
|
||||
75
docs/agents/pretraining.md
Normal file
75
docs/agents/pretraining.md
Normal file
@@ -0,0 +1,75 @@
|
||||
# Pretraining / Continual Pretraining — Agent Reference
|
||||
|
||||
Train on raw text with no input masking. Two approaches depending on dataset size.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Continual pretraining on domain-specific corpora
|
||||
- Adapting a base model to a new language or domain before fine-tuning
|
||||
- Pretraining-style data where the entire text is the training signal
|
||||
|
||||
## Choosing an Approach
|
||||
|
||||
| | Non-streaming (`type: completion`) | Streaming (`pretraining_dataset`) |
|
||||
|---|---|---|
|
||||
| **Dataset size** | Fits in memory | Too large to fit in memory |
|
||||
| **Tokenization** | Pre-tokenized before training | On-demand during training |
|
||||
| **Config key** | `datasets:` | `pretraining_dataset:` |
|
||||
| **Long text handling** | Splits texts exceeding `sequence_len` | Concatenates into fixed-length sequences |
|
||||
| **Benefit** | Can preprocess on CPU, transfer to GPU | Start training immediately, no preprocessing |
|
||||
|
||||
## Non-Streaming: `type: completion`
|
||||
|
||||
For smaller datasets that fit in memory. Pre-tokenizes the entire dataset.
|
||||
|
||||
```yaml
|
||||
datasets:
|
||||
- path: my_corpus
|
||||
type: completion
|
||||
# field: text # Column name (default: "text")
|
||||
```
|
||||
|
||||
## Streaming: `pretraining_dataset`
|
||||
|
||||
For large corpora. Streams data on-demand without loading everything into memory.
|
||||
|
||||
```yaml
|
||||
pretraining_dataset:
|
||||
- path: HuggingFaceFW/fineweb-edu
|
||||
type: pretrain
|
||||
text_column: text
|
||||
split: train
|
||||
|
||||
max_steps: 1000 # Required — axolotl can't infer dataset size
|
||||
streaming_multipack_buffer_size: 10000 # Buffer for sample packing
|
||||
pretrain_multipack_attn: true # Prevent cross-attention between packed samples
|
||||
```
|
||||
|
||||
`max_steps` is required for streaming — one step = `sequence_len * micro_batch_size * gradient_accumulation_steps * num_gpus` tokens.
|
||||
|
||||
Full streaming docs: [streaming.qmd](../streaming.qmd)
|
||||
|
||||
## Dataset Format
|
||||
|
||||
```json
|
||||
{"text": "The complete document text goes here."}
|
||||
```
|
||||
|
||||
## Key Settings
|
||||
|
||||
- `sample_packing: true` + `pad_to_sequence_len: true` — pack documents into fixed-length sequences
|
||||
- `flash_attention: true` — required for sample packing
|
||||
- No adapter — typically full fine-tune for pretraining
|
||||
- `train_on_inputs: true` — default for completion (all tokens trained on)
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
src/axolotl/
|
||||
prompt_strategies/completion.py # Non-streaming: completion prompt strategy (no masking)
|
||||
utils/data/sft.py # Non-streaming: dataset loading and processing
|
||||
utils/data/streaming.py # Streaming: encode_streaming(), wrap_streaming_dataset()
|
||||
utils/schemas/config.py # Config fields: pretraining_dataset, pretrain_multipack_attn, etc.
|
||||
|
||||
examples/streaming/pretrain.yaml # Full streaming pretraining example config
|
||||
```
|
||||
48
docs/agents/reward_modelling.md
Normal file
48
docs/agents/reward_modelling.md
Normal file
@@ -0,0 +1,48 @@
|
||||
# Reward Modelling — Agent Reference
|
||||
|
||||
Train models to score responses for use as reward signals in RL. For full docs, see [reward_modelling.qmd](../reward_modelling.qmd).
|
||||
|
||||
## Types
|
||||
|
||||
### Outcome Reward Models (ORM)
|
||||
|
||||
Train a classifier to predict preference over entire interactions. Uses `AutoModelForSequenceClassification`.
|
||||
|
||||
```yaml
|
||||
base_model: google/gemma-2-2b
|
||||
model_type: AutoModelForSequenceClassification
|
||||
num_labels: 1
|
||||
reward_model: true
|
||||
chat_template: gemma
|
||||
datasets:
|
||||
- path: argilla/distilabel-intel-orca-dpo-pairs
|
||||
type: bradley_terry.chat_template
|
||||
```
|
||||
|
||||
Dataset format: `{"system": "...", "input": "...", "chosen": "...", "rejected": "..."}`
|
||||
|
||||
### Process Reward Models (PRM)
|
||||
|
||||
Train a token classifier to score each reasoning step. Uses `AutoModelForTokenClassification`.
|
||||
|
||||
```yaml
|
||||
base_model: Qwen/Qwen2.5-3B
|
||||
model_type: AutoModelForTokenClassification
|
||||
num_labels: 2
|
||||
process_reward_model: true
|
||||
datasets:
|
||||
- path: trl-lib/math_shepherd
|
||||
type: stepwise_supervised
|
||||
```
|
||||
|
||||
Dataset format: see [stepwise_supervised.qmd](../dataset-formats/stepwise_supervised.qmd).
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
src/axolotl/
|
||||
core/builders/causal.py # Handles reward_model flag in trainer builder
|
||||
prompt_strategies/bradley_terry/ # Bradley-Terry prompt strategies
|
||||
prompt_strategies/stepwise_supervised.py # PRM dataset strategy
|
||||
utils/schemas/config.py # reward_model, process_reward_model config fields
|
||||
```
|
||||
115
docs/agents/sft.md
Normal file
115
docs/agents/sft.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# SFT — Agent Reference
|
||||
|
||||
Supervised fine-tuning pipeline reference. For config templates and dataset format examples, see [getting-started.qmd](../getting-started.qmd) and [dataset-formats/](../dataset-formats/).
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
YAML Config → axolotl train config.yaml
|
||||
|
||||
1. Load base model (+ quantization if QLoRA/8-bit)
|
||||
2. Apply adapter layers (LoRA/QLoRA) if configured
|
||||
3. Load + tokenize dataset(s)
|
||||
- Apply prompt template (chat_template / alpaca / custom)
|
||||
- Mask inputs (train_on_inputs: false)
|
||||
- Pack samples into sequences (sample_packing: true)
|
||||
4. Training loop (HuggingFace Trainer)
|
||||
- forward → loss → backward → optimizer step → lr scheduler step
|
||||
5. Save model / adapter weights + tokenizer
|
||||
|
||||
Multi-GPU: FSDP or DeepSpeed shards model across GPUs automatically.
|
||||
```
|
||||
|
||||
## Components Required
|
||||
|
||||
1. A YAML config — model, dataset(s), adapter settings, hyperparameters
|
||||
2. A dataset — HuggingFace Hub, local JSONL/JSON/Parquet, or S3/GCS path
|
||||
3. (Optional) A custom prompt strategy — for non-standard dataset formats
|
||||
|
||||
No external server processes needed (unlike GRPO which requires vLLM).
|
||||
|
||||
## Dataset Format Decision Tree
|
||||
|
||||
```
|
||||
Is your data in chat/message format?
|
||||
├─ YES: OpenAI message format (role/content)?
|
||||
│ ├─ YES ──────────────────────> type: chat_template (recommended)
|
||||
│ └─ NO (custom field names) ──> type: chat_template + message_property_mappings
|
||||
└─ NO: Instruction/response pairs?
|
||||
├─ YES ──> type: alpaca (instruction, input, output)
|
||||
└─ NO: Raw text?
|
||||
├─ YES with segments ─────> type: input_output (template-free masking)
|
||||
└─ YES continuous ────────> type: completion (pretraining-style)
|
||||
```
|
||||
|
||||
Full format specs: [dataset-formats/](../dataset-formats/)
|
||||
|
||||
## Model Size to Adapter Choice
|
||||
|
||||
| Model Size | LoRA | QLoRA (4-bit) | Full Fine-Tune | VRAM (approx) |
|
||||
|-----------|------|---------------|----------------|---------------|
|
||||
| 1-3B | Preferred | Low-budget option | Single GPU OK | 8-16 GB (LoRA) |
|
||||
| 7-8B | Preferred | Good balance | Needs multi-GPU | 16-24 GB (LoRA) |
|
||||
| 13-14B | Preferred | Good balance | Multi-GPU required | 24-40 GB (LoRA) |
|
||||
| 30-70B | LoRA or QLoRA | Preferred for single GPU | Multi-node | 40-80 GB (QLoRA) |
|
||||
|
||||
## Hyperparameter Ranges
|
||||
|
||||
| Parameter | LoRA | QLoRA | Full FT |
|
||||
|-----------|------|-------|---------|
|
||||
| `learning_rate` | 1e-4 to 3e-4 | 1e-4 to 3e-4 | 1e-5 to 5e-5 |
|
||||
| `lora_r` | 16-64 | 16-64 | N/A |
|
||||
| `lora_alpha` | 1-2x `lora_r` | 1-2x `lora_r` | N/A |
|
||||
| `micro_batch_size` | 2-8 | 2-4 | 1-2 |
|
||||
| `gradient_accumulation_steps` | 2-8 | 4-16 | 4-16 |
|
||||
| `num_epochs` | 1-3 | 1-3 | 1-3 |
|
||||
| `optimizer` | `adamw_8bit` | `adamw_bnb_8bit` | `adamw_torch_fused` |
|
||||
|
||||
Effective batch = micro_batch * grad_accum * num_gpus. Lower LR for larger models.
|
||||
|
||||
## Healthy Training Indicators
|
||||
|
||||
| Metric | Healthy | Problem |
|
||||
|--------|---------|---------|
|
||||
| `train_loss` | Decreasing, starting ~2-4 for chat models | Flat or increasing from step 1 — data or LR issue |
|
||||
| `eval_loss` | Decreasing, tracks train_loss | Increasing while train_loss decreases — overfitting |
|
||||
| `grad_norm` | 0.1-10, relatively stable | Spikes >100 — instability. 0.0 — frozen weights |
|
||||
| `learning_rate` | Follows scheduler curve | Flat or NaN — config issue |
|
||||
|
||||
Watch for: loss never decreasing (check `train_on_inputs`, dataset, LR), loss goes to 0 quickly (overfitting), eval_loss diverging (reduce epochs, add regularization). See [training_stability.qmd](../training_stability.qmd).
|
||||
|
||||
## Known Issues
|
||||
|
||||
| Issue | Fix |
|
||||
|-------|-----|
|
||||
| OOM during training | Reduce `micro_batch_size`, enable `gradient_checkpointing`, reduce `sequence_len` |
|
||||
| `sample_packing` + SDPA + bf16 = 0.0 loss | Use `flash_attention: true` or disable `sample_packing` |
|
||||
| Missing chat template error | Set `chat_template: chatml` explicitly |
|
||||
| Label masking wrong | Run `axolotl preprocess config.yaml --debug` and inspect labels |
|
||||
| Loss NaN | Use `bf16: auto`, lower LR, check data for empty samples |
|
||||
| Tokenizer pad token / infinite loss | Set `special_tokens: pad_token: "<\|end_of_text\|>"` |
|
||||
| FSDP save hangs | Use `fsdp_state_dict_type: FULL_STATE_DICT` |
|
||||
| DeepSpeed CheckpointError | Set `use_reentrant: true` in `gradient_checkpointing_kwargs` |
|
||||
|
||||
Full troubleshooting: [training_stability.qmd](../training_stability.qmd), [debugging.qmd](../debugging.qmd)
|
||||
|
||||
## File Map
|
||||
|
||||
```
|
||||
src/axolotl/
|
||||
cli/train.py # Entry point for `axolotl train`
|
||||
cli/preprocess.py # Entry point for `axolotl preprocess`
|
||||
core/builders/causal.py # HFCausalTrainerBuilder — wires config → SFT trainer
|
||||
core/trainers/base.py # AxolotlTrainer — base trainer class
|
||||
core/trainers/mixins/ # Packing, optimizer, scheduler, checkpoints
|
||||
prompt_strategies/ # Format handlers: chat_template, alpaca, completion, input_output
|
||||
utils/schemas/config.py # AxolotlInputConfig — main config schema
|
||||
utils/schemas/datasets.py # SFTDataset, DatasetConfig
|
||||
utils/schemas/peft.py # LoraConfig — LoRA parameters
|
||||
integrations/liger/ # Liger kernel plugin
|
||||
|
||||
examples/llama-3/ # LoRA, QLoRA, full FT example configs
|
||||
docs/getting-started.qmd # Quickstart with config templates
|
||||
docs/optimizations.qmd # Flash attention, gradient checkpointing, sample packing
|
||||
docs/multi-gpu.qmd # FSDP and DeepSpeed setup
|
||||
```
|
||||
Reference in New Issue
Block a user