fix some of the edge cases for Jamba (#1452)
* fix some of the edge cases for Jamba * update requirements for jamba
This commit is contained in:
@@ -1,5 +1,10 @@
|
||||
# Jamba
|
||||
|
||||
qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
|
||||
|
||||
qlora single-gpu - training will start, but loss is off by an order of magnitude
|
||||
- ✅ qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
|
||||
- 35GiB VRAM per GPU w minimal context length
|
||||
- 56GiB VRAM per GPU (w multipack enabled)
|
||||
- ✅ qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
|
||||
- ✅ qlora single-gpu, ~51GiB VRAM
|
||||
- ✅ multipack
|
||||
- ❓ FSDP
|
||||
- ❓ 8-bit LoRA
|
||||
|
||||
Reference in New Issue
Block a user