Dion optimizer support (#3014)

* Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit
2025-08-04 16:33:30 -04:00
parent 33d094721c
commit ab49d16e34
10 changed files with 145 additions and 2 deletions
--- a/docs/nd_parallelism.qmd
+++ b/docs/nd_parallelism.qmd
@@ -1,5 +1,5 @@
 ---
-title: "N-D Parallelism"
+title: "N-D Parallelism (Beta)"
 ---

 Axolotl enables training models at scale by composing different parallelism techniques. This is essential when:
--- a/docs/optimizers.qmd
+++ b/docs/optimizers.qmd
@@ -0,0 +1,18 @@
+---
+title: Optimizers
+description: Configuring optimizers
+---
+
+### Dion Optimizer
+
+Microsoft's Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient
+orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.
+
+Usage:
+
+```yaml
+optimizer: dion
+dion_lr: 0.01
+dion_momentum: 0.95
+lr: 0.00001  # learning rate for embeddings and parameters that fallback to AdamW
+```