feat(docs): comprehensive improvement (#3564)

* docs: comprehensive documentation improvements for humans and agents New human docs: - grpo.qmd: GRPO deep dive (async, rewards, IS correction, scaling) - ebft.qmd: EBFT guide (structured/strided modes, feature extraction) - choosing_method.qmd: decision tree for SFT vs LoRA vs DPO vs GRPO - vllm_serving.qmd: vLLM setup for GRPO (server/colocate, LoRA sync) - training_stability.qmd: monitoring, NaN debugging, OOM, healthy metrics New agent docs: - AGENTS_SFT.md: agent reference for supervised fine-tuning - AGENTS_DPO.md: agent reference for preference learning (DPO/KTO/ORPO) Updated existing docs: - rlhf.qmd: cross-references to new GRPO/EBFT/choosing-method guides - getting-started.qmd: reorganized Next Steps with links to new guides - debugging.qmd: link to training stability guide - _quarto.yml: added new pages to sidebar navigation Removed: - bak.agents.md: stale backup that confused agents * docs: trim duplicated generic config from AGENTS_DPO.md Remove boilerplate training params (optimizer, gradient_checkpointing, flash_attention, etc.) from each method template. These are not preference-learning-specific and are already covered in AGENTS_SFT.md. Config templates now show only method-specific fields with a reference to AGENTS_SFT.md for the rest. * docs: deduplicate across new doc pages - grpo.qmd: collapse vLLM setup section to brief config + link to vllm_serving.qmd; collapse IS correction to essentials + link; replace full monitoring tables with summary + link to training_stability.qmd - vllm_serving.qmd: remove duplicated async/IS config reference tables (already in grpo.qmd config reference); replace full example config with link to grpo.qmd quick start - ebft.qmd: trim generic training params in quick start config * fix: train scripts * feat: split files into cleaner parts * fix: cleanup pretraining docs --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>
2026-04-02 19:01:26 +07:00
parent 50e9573f24
commit 16e32232fb
17 changed files with 2680 additions and 105 deletions
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -238,6 +238,7 @@ website:
        - section: "Getting Started"
          contents:
            - docs/getting-started.qmd
+            - docs/choosing_method.qmd
            - docs/installation.qmd
            - docs/inference.qmd
            - section: "Model Guides"
@@ -302,6 +303,9 @@ website:
          contents:
            - docs/multimodal.qmd
            - docs/rlhf.qmd
+            - docs/grpo.qmd
+            - docs/ebft.qmd
+            - docs/vllm_serving.qmd
            - docs/reward_modelling.qmd
            - docs/lr_groups.qmd
            - docs/lora_optims.qmd
@@ -334,6 +338,7 @@ website:
        - section: "Troubleshooting"
          contents:
            - docs/faq.qmd
+            - docs/training_stability.qmd
            - docs/debugging.qmd
            - docs/nccl.qmd