Jamba (#1451)
* fixes for larger models * add qlora example for deepspeed * add readme for jamba
This commit is contained in:
5
examples/jamba/README.md
Normal file
5
examples/jamba/README.md
Normal file
@@ -0,0 +1,5 @@
|
||||
# Jamba
|
||||
|
||||
qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
|
||||
|
||||
qlora single-gpu - training will start, but loss is off by an order of magnitude
|
||||
Reference in New Issue
Block a user