add support for 4bit optimizers

This commit is contained in:
Wing Lian
2024-03-19 22:57:40 -04:00
parent dd449c5cd8
commit a236f5eab5
4 changed files with 110 additions and 1 deletions

29
docs/optimizers.md Normal file
View File

@@ -0,0 +1,29 @@
# Optimizers
Optimizers are an important component when training LLMs. Optimizers are responsible for updating the model's weights (parameters) based on the gradients computed during backpropagation.
The goal of an optimizer is to minimize the loss function.
### Adam/AdamW Optimizers
```yaml
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-8
weight_decay: 0.0
```
### GaLore Optimizer
https://huggingface.co/papers/2403.03507
```yaml
optimizer: galore_adamw | galore_adamw_8bit | galore_adafactor
optim_args:
rank: 128
update_proj_gap: 200
scale: 0.25
proj_type: std
optim_target_modules:
- mlp
- attn
```