* fixes for larger models

* add qlora example for deepspeed

* add readme for jamba
This commit is contained in:
Wing Lian
2024-03-28 21:03:22 -04:00
committed by GitHub
parent 4155e9988f
commit 02af0820f7
5 changed files with 76 additions and 1 deletions

5
examples/jamba/README.md Normal file
View File

@@ -0,0 +1,5 @@
# Jamba
qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
qlora single-gpu - training will start, but loss is off by an order of magnitude