Files
axolotl/examples/jamba
Wing Lian af8d257aa2 make pad_to_sequence_len default to the same value as sample_packing (#2941) [skip ci]
* make pad_to_sequence_len default to the same value as sample_packing

* remove duplicate validation

* fix test

* update description meta

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-07-21 11:40:56 -04:00
..

Jamba

  • qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
    • 35GiB VRAM per GPU w minimal context length
    • 56GiB VRAM per GPU (w multipack enabled)
  • qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
  • qlora single-gpu, ~51GiB VRAM
  • multipack
  • FSDP
  • 8-bit LoRA