Files

Wing Lian 9b6ee83a73 FDSP + QLoRA (#1378 )

* wip qlora + fsdp fixes

* more fixes

* make sure to load the lora 🤦

* only setup quantized meta on non-zero rank:

* only run setup_quantized_peft_meta_for_training for qlora+fsdp

* more fixes for qlora+fsdp

* chore: lint

* add example yml

* support mistral too

* fix for model_type and add mixtral support too

* set cpu_offload: false to reduce vram, constrain new accleerator logic to qlora + fsdp

* refactor for duplicate code

2024-03-08 14:31:01 -05:00

fft_optimized.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

gptq-lora.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

loftq.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

lora.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

qlora-fsdp.yml

FDSP + QLoRA (#1378 )

2024-03-08 14:31:01 -05:00

qlora.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

README.md

Implement fused modules (#747 )

2023-10-21 16:08:25 -04:00

relora.yml

fix(examples): remove is_*_derived as it's parsed automatically (#1297 )

2024-02-22 00:52:46 +09:00

README.md

Overview

This is an example of a llama-2 configuration for 7b and 13b. The yaml file contains configuration for the 7b variant, but you can just aswell use the same settings for 13b.

The 7b variant fits on any 24GB VRAM GPU and will take up about 17 GB of VRAM during training if using qlora and 20 GB if using lora. On a RTX 4090 it trains 3 epochs of the default dataset in about 15 minutes.

The 13b variant will fit if you change these settings to these values: gradient_accumulation_steps: 2 micro_batch_size: 1

accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

accelerate launch -m axolotl.cli.train examples/llama-2/lora.yml

To launch a full finetuning with 16-bit precision:

accelerate launch -m axolotl.cli.train examples/llama-2/fft_optimized.yml