Files

Wing Lian af8d257aa2 make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

* make pad_to_sequence_len default to the same value as sample_packing

* remove duplicate validation

* fix test

* update description meta

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>

2025-07-21 11:40:56 -04:00

qlora_deepspeed.yaml

checkpoint model on first step callback (#2906 )

2025-07-15 15:00:48 -04:00

qlora_fsdp_large.yaml

make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci]

2025-07-21 11:40:56 -04:00

qlora.yaml

checkpoint model on first step callback (#2906 )

2025-07-15 15:00:48 -04:00

README.md

rename jamba example (#1846 ) [skip ci]

2024-08-22 09:22:55 -04:00

README.md

Jamba

✅ qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
- 35GiB VRAM per GPU w minimal context length
- 56GiB VRAM per GPU (w multipack enabled)
✅ qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
✅ qlora single-gpu, ~51GiB VRAM
✅ multipack
✅ FSDP
❓ 8-bit LoRA