Feat(doc): Reorganize documentation, fix broken syntax, update notes (#2348)

* feat(doc): organize docs, add to menu bar, fix broken formatting * feat: add link to custom integrations * feat: update readme for integrations to include citations and repo link * chore: update lm_eval info * chore: use fullname * Update docs/cli.qmd per suggestion Co-authored-by: Dan Saunders <danjsaund@gmail.com> * feat: add sweep doc * feat: add kd doc * fix: remove toc * fix: update deprecation * feat: add more info about chat_template issues * fix: heading level * fix: shell->bash code block * fix: ray link * fix(doc): heading level, header links, formatting * feat: add grpo docs * feat: add style changes * fix: wrong cli arg for lm-eval * fix: remove old run method * feat: load custom integration doc dynamically * fix: remove old cli way * fix: toc * fix: minor formatting --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com>
2025-02-25 16:09:37 +07:00
parent 1110a37e21
commit 2efe1b4c09
32 changed files with 940 additions and 443 deletions
--- a/src/axolotl/integrations/cut_cross_entropy/README.md
+++ b/src/axolotl/integrations/cut_cross_entropy/README.md
@@ -1,6 +1,10 @@
 # Cut Cross Entropy

-### Usage
+Cut Cross Entropy reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.
+
+See https://github.com/apple/ml-cross-entropy
+
+## Usage

 ```yaml
 plugins:
@@ -8,3 +12,19 @@ plugins:

 cut_cross_entropy: true
 ```
+
+## Citation
+
+```bib
+@article{wijmans2024cut,
+  author       = {Erik Wijmans and
+                  Brody Huval and
+                  Alexander Hertzberg and
+                  Vladlen Koltun and
+                  Philipp Kr\"ahenb\"uhl},
+  title        = {Cut Your Losses in Large-Vocabulary Language Models},
+  journal      = {arXiv},
+  year         = {2024},
+  url          = {https://arxiv.org/abs/2411.09009},
+}
+```
--- a/src/axolotl/integrations/grokfast/README.md
+++ b/src/axolotl/integrations/grokfast/README.md
@@ -2,7 +2,7 @@

 See https://github.com/ironjr/grokfast

-### Usage
+## Usage

 ```yaml
 plugins:
@@ -11,3 +11,14 @@ plugins:
 grokfast_alpha: 2.0
 grokfast_lamb: 0.98
 ```
+
+## Citation
+
+```bib
+@article{lee2024grokfast,
+    title={{Grokfast}: Accelerated Grokking by Amplifying Slow Gradients},
+    author={Lee, Jaerin and Kang, Bong Gyun and Kim, Kihoon and Lee, Kyoung Mu},
+    journal={arXiv preprint arXiv:2405.20233},
+    year={2024}
+}
+```
--- a/src/axolotl/integrations/kd/README.md
+++ b/src/axolotl/integrations/kd/README.md
@@ -0,0 +1,23 @@
+# Knowledge Distillation
+
+## Usage
+
+```yaml
+plugins:
+  - "axolotl.integrations.kd.KDPlugin"
+
+kd_trainer: True
+kd_ce_alpha: 0.1
+kd_alpha: 0.9
+kd_temperature: 1.0
+
+torch_compile: True  # torch>=2.5.1, recommended to reduce vram
+
+datasets:
+  - path: ...
+    type: "axolotl.integrations.kd.chat_template"
+    field_messages: "messages_combined"
+    logprobs_field: "llm_text_generation_vllm_logprobs"  # for kd only, field of logprobs
+```
+
+An example dataset can be found at [`axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample`](https://huggingface.co/datasets/axolotl-ai-co/evolkit-logprobs-pipeline-75k-v2-sample)
--- a/src/axolotl/integrations/liger/README.md
+++ b/src/axolotl/integrations/liger/README.md
@@ -0,0 +1,36 @@
+# Liger Kernel Integration
+
+Liger Kernel provides efficient Triton kernels for LLM training, offering:
+
+- 20% increase in multi-GPU training throughput
+- 60% reduction in memory usage
+- Compatibility with both FSDP and DeepSpeed
+
+See https://github.com/linkedin/Liger-Kernel
+
+## Usage
+
+```yaml
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+liger_rope: true
+liger_rms_norm: true
+liger_glu_activation: true
+liger_layer_norm: true
+liger_fused_linear_cross_entropy: true
+```
+
+## Citation
+
+```bib
+@article{hsu2024ligerkernelefficienttriton,
+      title={Liger Kernel: Efficient Triton Kernels for LLM Training},
+      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
+      year={2024},
+      eprint={2410.10989},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2410.10989},
+      journal={arXiv preprint arXiv:2410.10989},
+}
+```
--- a/src/axolotl/integrations/lm_eval/README.md
+++ b/src/axolotl/integrations/lm_eval/README.md
@@ -1,6 +1,10 @@
 # LM Eval Harness

-### Usage
+Run evaluation on model using the popular lm-evaluation-harness library.
+
+See https://github.com/EleutherAI/lm-evaluation-harness
+
+## Usage

 ```yaml
 plugins:
@@ -10,4 +14,22 @@ lm_eval_tasks:
  - gsm8k
  - hellaswag
  - arc_easy
+
+lm_eval_batch_size: # Batch size for evaluation
+output_dir: # Directory to save evaluation results
+```
+
+## Citation
+
+```bib
+@misc{eval-harness,
+  author       = {Gao, Leo and Tow, Jonathan and Abbasi, Baber and Biderman, Stella and Black, Sid and DiPofi, Anthony and Foster, Charles and Golding, Laurence and Hsu, Jeffrey and Le Noac'h, Alain and Li, Haonan and McDonell, Kyle and Muennighoff, Niklas and Ociepa, Chris and Phang, Jason and Reynolds, Laria and Schoelkopf, Hailey and Skowron, Aviya and Sutawika, Lintang and Tang, Eric and Thite, Anish and Wang, Ben and Wang, Kevin and Zou, Andy},
+  title        = {A framework for few-shot language model evaluation},
+  month        = 07,
+  year         = 2024,
+  publisher    = {Zenodo},
+  version      = {v0.4.3},
+  doi          = {10.5281/zenodo.12608602},
+  url          = {https://zenodo.org/records/12608602}
+}
 ```
--- a/src/axolotl/integrations/spectrum/README.md
+++ b/src/axolotl/integrations/spectrum/README.md
@@ -1,15 +1,17 @@
-## Spectrum: Targeted Training on Signal to Noise Ratio
+# Spectrum: Targeted Training on Signal to Noise Ratio

 by Eric Hartford, Lucas Atkins, Fernando Fernandes, David Golchinfar

 This plugin contains code to freeze the bottom fraction of modules in a model, based on the Signal-to-Noise Ratio (SNR).

-### Overview
+See https://github.com/cognitivecomputations/spectrum
+
+## Overview

 Spectrum is a tool for scanning and evaluating the Signal-to-Noise Ratio (SNR) of layers in large language models.
 By identifying the top n% of layers with the highest SNR, you can optimize training efficiency.

-### Usage
+## Usage

 ```yaml
 plugins:
@@ -19,3 +21,17 @@ spectrum_top_fraction: 0.5
 # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
 spectrum_model_name: meta-llama/Meta-Llama-3.1-8B
 ```
+
+## Citation
+
+```bib
+@misc{hartford2024spectrumtargetedtrainingsignal,
+      title={Spectrum: Targeted Training on Signal to Noise Ratio},
+      author={Eric Hartford and Lucas Atkins and Fernando Fernandes Neto and David Golchinfar},
+      year={2024},
+      eprint={2406.06623},
+      archivePrefix={arXiv},
+      primaryClass={cs.LG},
+      url={https://arxiv.org/abs/2406.06623},
+}
+```