Files

Wing Lian 14706504e3 various bugfixes (#856 )

* various bugfixes

use latest tinyllama release
check if val_set_size is empty first
update sdp and xformers llama patches for updated upstream transformers
fix system prompt when no input
calculate total and total supervised tokens even when not sample packing

* add fix for when eval size is estimated to be too small

* should be len 1 for dataset length

* add catchall kwargs

2023-11-15 12:23:18 -05:00

fft_optimized.yml

don't compile deepspeed or bitsandbytes from source (#837 )

2023-11-08 19:49:55 -05:00

gptq-lora.yml

don't compile deepspeed or bitsandbytes from source (#837 )

2023-11-08 19:49:55 -05:00

lora.yml

don't compile deepspeed or bitsandbytes from source (#837 )

2023-11-08 19:49:55 -05:00

qlora.yml

don't compile deepspeed or bitsandbytes from source (#837 )

2023-11-08 19:49:55 -05:00

README.md

Implement fused modules (#747 )

2023-10-21 16:08:25 -04:00

relora.yml

don't compile deepspeed or bitsandbytes from source (#837 )

2023-11-08 19:49:55 -05:00

tiny-llama.yml

various bugfixes (#856 )

2023-11-15 12:23:18 -05:00

README.md

Overview

This is an example of a llama-2 configuration for 7b and 13b. The yaml file contains configuration for the 7b variant, but you can just aswell use the same settings for 13b.

The 7b variant fits on any 24GB VRAM GPU and will take up about 17 GB of VRAM during training if using qlora and 20 GB if using lora. On a RTX 4090 it trains 3 epochs of the default dataset in about 15 minutes.

The 13b variant will fit if you change these settings to these values: gradient_accumulation_steps: 2 micro_batch_size: 1

accelerate launch -m axolotl.cli.train examples/llama-2/qlora.yml

accelerate launch -m axolotl.cli.train examples/llama-2/lora.yml

To launch a full finetuning with 16-bit precision:

accelerate launch -m axolotl.cli.train examples/llama-2/fft_optimized.yml