* Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit
19 lines
433 B
Plaintext
19 lines
433 B
Plaintext
---
|
|
title: Optimizers
|
|
description: Configuring optimizers
|
|
---
|
|
|
|
### Dion Optimizer
|
|
|
|
Microsoft's Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient
|
|
orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.
|
|
|
|
Usage:
|
|
|
|
```yaml
|
|
optimizer: dion
|
|
dion_lr: 0.01
|
|
dion_momentum: 0.95
|
|
lr: 0.00001 # learning rate for embeddings and parameters that fallback to AdamW
|
|
```
|