fix some of the edge cases for Jamba (#1452)

* fix some of the edge cases for Jamba

* update requirements for jamba
This commit is contained in:
Wing Lian
2024-03-29 02:38:02 -04:00
committed by GitHub
parent e634118f90
commit 05b398a072
8 changed files with 92 additions and 17 deletions

View File

@@ -1,5 +1,10 @@
# Jamba
qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
qlora single-gpu - training will start, but loss is off by an order of magnitude
-qlora w/ deepspeed Zero-2 needs at least 2x GPUs and
- 35GiB VRAM per GPU w minimal context length
- 56GiB VRAM per GPU (w multipack enabled)
- ✅ qlora w/ deepspeed Zero-3 needs at least 2x GPUs and 67GiB VRAM (wtf?)
- ✅ qlora single-gpu, ~51GiB VRAM
- ✅ multipack
- ❓ FSDP
- ❓ 8-bit LoRA