qlora-fsdp ram efficient loading with hf trainer (#1791)

* fix 405b with lower cpu ram requirements * make sure to use doouble quant and only skip output embeddings * set model attributes * more fixes for sharded fsdp loading * update the base model in example to use pre-quantized nf4-bf16 weights * upstream fixes for qlora+fsdp
2024-07-30 19:21:38 -04:00
parent dbf8fb549e
commit 3ebf22464b
10 changed files with 52 additions and 14 deletions
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,9 +1,9 @@
 --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
 packaging==23.2
 peft==0.11.1
-transformers==4.43.3
+transformers @ git+https://github.com/huggingface/transformers.git@026a173a64372e9602a16523b8fae9de4b0ff428
 tokenizers==0.19.1
-bitsandbytes==0.43.1
+bitsandbytes==0.43.3
 accelerate==0.32.0
 deepspeed==0.14.4
 pydantic==2.6.3