6 lines
156 B
Markdown
6 lines
156 B
Markdown
# Jamba
|
|
|
|
qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
|
|
|
|
qlora single-gpu - training will start, but loss is off by an order of magnitude
|