Files

Leonardo Emili 5a5d47458d Add seq2seq eval benchmark callback (#1274 )

* Add CausalLMBenchEvalCallback for measuring seq2seq performance

* Fix code for pre-commit

* Fix typing and improve logging

* eval_sample_packing must be false with CausalLMBenchEvalCallback

2024-02-13 08:24:30 -08:00

fft_optimized.yml

Peft lotfq (#1222 )

2024-01-28 18:50:08 -05:00

gptq-lora.yml

new evals_per_epoch and saves_per_epoch to make things cleaner (#944 )

2023-12-12 15:35:23 -05:00

loftq.yml

Add seq2seq eval benchmark callback (#1274 )

2024-02-13 08:24:30 -08:00

lora.yml

Add seq2seq eval benchmark callback (#1274 )

2024-02-13 08:24:30 -08:00

qlora.yml

Peft lotfq (#1222 )

2024-01-28 18:50:08 -05:00

README.md

Implement fused modules (#747 )

2023-10-21 16:08:25 -04:00

relora.yml

set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci]

2024-01-22 18:44:01 -05:00

README.md

Overview

This is an example of a llama-2 configuration for 7b and 13b. The yaml file contains configuration for the 7b variant, but you can just aswell use the same settings for 13b.

The 7b variant fits on any 24GB VRAM GPU and will take up about 17 GB of VRAM during training if using qlora and 20 GB if using lora. On a RTX 4090 it trains 3 epochs of the default dataset in about 15 minutes.

The 13b variant will fit if you change these settings to these values: gradient_accumulation_steps: 2 micro_batch_size: 1

accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

accelerate launch -m axolotl.cli.train examples/llama-2/lora.yml

To launch a full finetuning with 16-bit precision:

accelerate launch -m axolotl.cli.train examples/llama-2/fft_optimized.yml