Optimizers
+Dion Optimizer
+Microsoft’s Dion (DIstributed OrthoNormalization) optimizer is a scalable and communication-efficient +orthonormalizing optimizer that uses low-rank approximations to reduce gradient communication.
+Usage:
+optimizer: dion
+dion_lr: 0.01
+dion_momentum: 0.95
+lr: 0.00001 # learning rate for embeddings and parameters that fallback to AdamW