Learning Rate Groups
+Background
+Inspired by LoRA+, Axolotl allows practitioners to specify separate learning rates for each module or groups of modules in a model.
+Example
+lr_groups:
+ - name: o_proj
+ modules:
+ - self_attn.o_proj.weight
+ lr: 1e-6
+ - name: q_proj
+ modules:
+ - model.layers.2.self_attn.q_proj.weight
+ lr: 1e-5
+
+learning_rate: 2e-5In this example, we have a default learning rate of 2e-5 across the entire model, but we have a separate learning rate of 1e-6 for all the self attention o_proj modules across all layers, and a learning are of 1e-5 to the 3rd layer’s self attention q_proj module.