Dion optimizer support (#3014)
* Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit
This commit is contained in:
@@ -1,5 +1,5 @@
|
||||
---
|
||||
title: "N-D Parallelism"
|
||||
title: "N-D Parallelism (Beta)"
|
||||
---
|
||||
|
||||
Axolotl enables training models at scale by composing different parallelism techniques. This is essential when:
|
||||
|
||||
18
docs/optimizers.qmd
Normal file
18
docs/optimizers.qmd
Normal file
@@ -0,0 +1,18 @@
|
||||
---
|
||||
title: Optimizers
|
||||
description: Configuring optimizers
|
||||
---
|
||||
|
||||
### Dion Optimizer
|
||||
|
||||
Microsoft's Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient
|
||||
orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.
|
||||
|
||||
Usage:
|
||||
|
||||
```yaml
|
||||
optimizer: dion
|
||||
dion_lr: 0.01
|
||||
dion_momentum: 0.95
|
||||
lr: 0.00001 # learning rate for embeddings and parameters that fallback to AdamW
|
||||
```
|
||||
Reference in New Issue
Block a user