Optimizers

Configuring optimizers

Dion Optimizer

Microsoft’s Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.

Usage:

optimizer: dion
dion_lr: 0.01
dion_momentum: 0.95
lr: 0.00001  # learning rate for embeddings and parameters that fallback to AdamW