qlora-fsdp ram efficient loading with hf trainer (#1791)

* fix 405b with lower cpu ram requirements

* make sure to use doouble quant and only skip output embeddings

* set model attributes

* more fixes for sharded fsdp loading

* update the base model in example to use pre-quantized nf4-bf16 weights

* upstream fixes  for qlora+fsdp
This commit is contained in:
Wing Lian
2024-07-30 19:21:38 -04:00
committed by GitHub
parent dbf8fb549e
commit 3ebf22464b
10 changed files with 52 additions and 14 deletions

View File

@@ -1,9 +1,9 @@
--extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
packaging==23.2
peft==0.11.1
transformers==4.43.3
transformers @ git+https://github.com/huggingface/transformers.git@026a173a64372e9602a16523b8fae9de4b0ff428
tokenizers==0.19.1
bitsandbytes==0.43.1
bitsandbytes==0.43.3
accelerate==0.32.0
deepspeed==0.14.4
pydantic==2.6.3