add support for 4bit optimizers
This commit is contained in:
29
docs/optimizers.md
Normal file
29
docs/optimizers.md
Normal file
@@ -0,0 +1,29 @@
|
||||
# Optimizers
|
||||
|
||||
Optimizers are an important component when training LLMs. Optimizers are responsible for updating the model's weights (parameters) based on the gradients computed during backpropagation.
|
||||
The goal of an optimizer is to minimize the loss function.
|
||||
|
||||
### Adam/AdamW Optimizers
|
||||
|
||||
```yaml
|
||||
adam_beta1: 0.9
|
||||
adam_beta2: 0.999
|
||||
adam_epsilon: 1e-8
|
||||
weight_decay: 0.0
|
||||
```
|
||||
|
||||
### GaLore Optimizer
|
||||
|
||||
https://huggingface.co/papers/2403.03507
|
||||
|
||||
```yaml
|
||||
optimizer: galore_adamw | galore_adamw_8bit | galore_adafactor
|
||||
optim_args:
|
||||
rank: 128
|
||||
update_proj_gap: 200
|
||||
scale: 0.25
|
||||
proj_type: std
|
||||
optim_target_modules:
|
||||
- mlp
|
||||
- attn
|
||||
```
|
||||
Reference in New Issue
Block a user