Jamba (#1451)

* fixes for larger models * add qlora example for deepspeed * add readme for jamba
2024-03-28 21:03:22 -04:00
parent 4155e9988f
commit 02af0820f7
5 changed files with 76 additions and 1 deletions
--- a/examples/jamba/README.md
+++ b/examples/jamba/README.md
@@ -0,0 +1,5 @@
+# Jamba
+
+qlora w/ deepspeed needs at least 2x GPUs and 35GiB VRAM per GPU
+
+qlora single-gpu - training will start, but loss is off by an order of magnitude