Files

Wing Lian af8d257aa2 make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

* make pad_to_sequence_len default to the same value as sample_packing

* remove duplicate validation

* fix test

* update description meta

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>

2025-07-21 11:40:56 -04:00

1.6b

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

README.md

move unmaintained examples to archive (#2903 ) [skip ci]

2025-07-12 11:39:51 -04:00

README.md

StableLM 2

This repository contains examples for training and processing using StableLM-2. It also includes a section to help you estimate the GPU requirements for your specific use case.

Estimating GPU Requirements

type	deepspeed	batch size	context length	vRAM GPU (GBs)
full finetune	N/A	1	4096	~21.5GBs
full finetune	zero2	1	4096	~20GBs
lora	N/A	1	4096	~16.6GBs

The above are estimates and might differ slight depending on the setup for example whether you pack your sequence lengths or not (the above assumes you do to length 4096).

This blog post from Hamel Husain was a great resource for estimating these numbers: https://hamel.dev/notes/llm/03_estimating_vram.html

Training

We have example scripts here for both full finetuning and lora using the popular alpaca dataset:

# preprocess the dataset
CUDA_VISIBLE_DEVICES="" python -m axolotl.cli.preprocess examples/stablelm-2/1.6b/lora.yml

Single GPU Training:

python -m axolotl.cli.train examples/stablelm-2/fft.yml --deepspeed deepspeed_configs/zero2.json
# OR
python -m axolotl.cli.train examples/stablelm-2/1.6b/lora.yml

Multinode GPU Training with accelerate:

# make sure you've configured accelerate properly
accelerate launch -m axolotl.cli.train examples/stablelm-2/1.6b/fft.yml --deepspeed deepspeed_configs/zero2.json