* fix: force train split for json,csv,txt for test_datasets * feat(doc): add info on mixing datasets for VLM * feat(doc): max memory * fix(doc): clarify lr groups * fix: add info on vision not being dropped * feat: add qwen3-vl to multimodal docs * fix: add moe blocks to arch list * feat(doc): improve mistral docs * chore: add helpful link [skip-e2e] * fix: add vram usage for mistral small * Update link in docs/faq.qmd Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: salman <salman.mohammadi@outlook.com>
36 lines
1.0 KiB
Plaintext
36 lines
1.0 KiB
Plaintext
---
|
|
title: Learning Rate Groups
|
|
description: "Setting different learning rates by module name"
|
|
---
|
|
|
|
## Background
|
|
|
|
Inspired by LoRA+, Axolotl allows practitioners to specify separate learning rates for each module or groups of
|
|
modules in a model.
|
|
|
|
## Example
|
|
|
|
```yaml
|
|
lr_groups:
|
|
- name: o_proj
|
|
modules:
|
|
- self_attn.o_proj.weight
|
|
lr: 1e-6
|
|
- name: q_proj
|
|
modules:
|
|
- model.layers.2.self_attn.q_proj.weight
|
|
lr: 1e-5
|
|
|
|
learning_rate: 2e-5
|
|
```
|
|
|
|
In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate
|
|
of 1e-6 for all the self attention `o_proj` modules across all layers, and a learning are of 1e-5 to the 3rd layer's
|
|
self attention `q_proj` module.
|
|
|
|
::: {.callout-note}
|
|
|
|
We currently only support varying `lr` for now. If you're interested in adding support for others (`weight_decay`), we welcome PRs. See https://github.com/axolotl-ai-cloud/axolotl/blob/613bcf90e58f3ab81d3827e7fc572319908db9fb/src/axolotl/core/trainers/mixins/optimizer.py#L17
|
|
|
|
:::
|