30 lines
628 B
Markdown
30 lines
628 B
Markdown
# Optimizers
|
|
|
|
Optimizers are an important component when training LLMs. Optimizers are responsible for updating the model's weights (parameters) based on the gradients computed during backpropagation.
|
|
The goal of an optimizer is to minimize the loss function.
|
|
|
|
### Adam/AdamW Optimizers
|
|
|
|
```yaml
|
|
adam_beta1: 0.9
|
|
adam_beta2: 0.999
|
|
adam_epsilon: 1e-8
|
|
weight_decay: 0.0
|
|
```
|
|
|
|
### GaLore Optimizer
|
|
|
|
https://huggingface.co/papers/2403.03507
|
|
|
|
```yaml
|
|
optimizer: galore_adamw | galore_adamw_8bit | galore_adafactor
|
|
optim_args:
|
|
rank: 128
|
|
update_proj_gap: 200
|
|
scale: 0.25
|
|
proj_type: std
|
|
optim_target_modules:
|
|
- mlp
|
|
- attn
|
|
```
|