Dion optimizer support (#3014)

* Add support for Dion optimizer

* dion training kwargs

* fix var names

* no dion 8bit for now

* use updated axolotl-contribs-mit for dion optimizer

* add smoke test for dion optimizer

* add docs

* fix typo during edits

* fix test to not remove load in 8bit
This commit is contained in:
Wing Lian
2025-08-04 16:33:30 -04:00
committed by GitHub
parent 33d094721c
commit ab49d16e34
10 changed files with 145 additions and 2 deletions

View File

@@ -1,5 +1,5 @@
---
title: "N-D Parallelism"
title: "N-D Parallelism (Beta)"
---
Axolotl enables training models at scale by composing different parallelism techniques. This is essential when:

18
docs/optimizers.qmd Normal file
View File

@@ -0,0 +1,18 @@
---
title: Optimizers
description: Configuring optimizers
---
### Dion Optimizer
Microsoft's Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient
orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.
Usage:
```yaml
optimizer: dion
dion_lr: 0.01
dion_momentum: 0.95
lr: 0.00001 # learning rate for embeddings and parameters that fallback to AdamW
```