Files

Wing Lian 7b68dfafd7 jagged lr restart scheudler (#1680 ) [skip ci]

* jagged lr restart scheudler

var name fix
make sure to create scheduler first

* wire things together

* more fixes

* fix for nesting scheduler and first anneal phase

* no need for relora trainer anymore since we've generalized the relora scheduler

* remove redundant relora scheduler and lint

* update relora e2e test for updated params

* need restart steps for relora test

* update quarto docs for dropped relora trainer

* update example yaml

* drop verbose arg

* min lr scale support for jagged lr

* don't let min_lr be nonetype

* cleanup args

2025-07-31 13:50:03 -04:00

fft_optimized.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

gptq-lora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

lisa.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

loftq.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

lora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

qlora-fsdp.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

qlora.yml

use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci]

2025-07-30 06:44:06 -04:00

README.md

Implement fused modules (#747 )

2023-10-21 16:08:25 -04:00

relora.yml

jagged lr restart scheudler (#1680 ) [skip ci]

2025-07-31 13:50:03 -04:00

README.md

Overview

This is an example of a llama-2 configuration for 7b and 13b. The yaml file contains configuration for the 7b variant, but you can just aswell use the same settings for 13b.

The 7b variant fits on any 24GB VRAM GPU and will take up about 17 GB of VRAM during training if using qlora and 20 GB if using lora. On a RTX 4090 it trains 3 epochs of the default dataset in about 15 minutes.

The 13b variant will fit if you change these settings to these values: gradient_accumulation_steps: 2 micro_batch_size: 1

accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

accelerate launch -m axolotl.cli.train examples/llama-2/lora.yml

To launch a full finetuning with 16-bit precision:

accelerate launch -m axolotl.cli.train examples/llama-2/fft_optimized.yml