feat(docs): comprehensive improvement (#3564)
* docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
This commit is contained in:
@@ -170,17 +170,26 @@ More details can be found in [Merging LoRA weights](inference.qmd#sec-merging).
|
||||
|
||||
## Next Steps {#sec-next-steps}
|
||||
|
||||
Now that you have the basics, you might want to:
|
||||
Now that you have the basics, explore these guides based on what you want to do:
|
||||
|
||||
- Try different model architectures
|
||||
- Experiment with hyperparameters
|
||||
- Use more advanced training methods
|
||||
- Scale up to larger models
|
||||
**Choose your path:**
|
||||
|
||||
Check our other guides for details on these topics:
|
||||
- [Choosing a Fine-Tuning Method](choosing_method.qmd) — SFT vs LoRA vs QLoRA vs GRPO vs DPO, with hardware recommendations
|
||||
|
||||
- [Configuration Guide](config-reference.qmd) - Full configuration options
|
||||
- [Dataset Loading](dataset_loading.qmd) - Loading datasets from various sources
|
||||
- [Dataset Formats](dataset-formats) - Working with different data formats
|
||||
- [Multi-GPU Training](multi-gpu.qmd)
|
||||
- [Multi-Node Training](multi-node.qmd)
|
||||
**Core guides:**
|
||||
|
||||
- [Dataset Loading](dataset_loading.qmd) — Loading datasets from various sources
|
||||
- [Dataset Formats](dataset-formats) — Working with different data formats
|
||||
- [Optimizations](optimizations.qmd) — Flash attention, gradient checkpointing, sample packing
|
||||
- [Training Stability & Debugging](training_stability.qmd) — Monitoring metrics, fixing NaN, OOM debugging
|
||||
|
||||
**Advanced training methods:**
|
||||
|
||||
- [RLHF / Preference Learning](rlhf.qmd) — DPO, KTO, GRPO, EBFT
|
||||
- [GRPO Training](grpo.qmd) — RL with custom rewards and vLLM generation
|
||||
- [vLLM Serving](vllm_serving.qmd) — Setting up vLLM for GRPO
|
||||
|
||||
**Scaling up:**
|
||||
|
||||
- [Multi-GPU Training](multi-gpu.qmd) — DeepSpeed, FSDP, DDP
|
||||
- [Multi-Node Training](multi-node.qmd) — Distributed training across machines
|
||||
|
||||
Reference in New Issue
Block a user