Files
axolotl/examples/jamba
Dan Saunders 10ba1622f7 checkpoint model on first step callback (#2906)
* checkpoint model on first step callback

* remove debug

* add test cases; update existing tests not to save on first step

* move test out of solo

* delete

* default to False

* typo
2025-07-15 15:00:48 -04:00
..

Jamba

  • qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
    • 35GiB VRAM per GPU w minimal context length
    • 56GiB VRAM per GPU (w multipack enabled)
  • qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
  • qlora single-gpu, ~51GiB VRAM
  • multipack
  • FSDP
  • 8-bit LoRA