qlora-fsdp ram efficient loading with hf trainer (#1791)
* fix 405b with lower cpu ram requirements * make sure to use doouble quant and only skip output embeddings * set model attributes * more fixes for sharded fsdp loading * update the base model in example to use pre-quantized nf4-bf16 weights * upstream fixes for qlora+fsdp
This commit is contained in:
@@ -1,9 +1,9 @@
|
||||
--extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
|
||||
packaging==23.2
|
||||
peft==0.11.1
|
||||
transformers==4.43.3
|
||||
transformers @ git+https://github.com/huggingface/transformers.git@026a173a64372e9602a16523b8fae9de4b0ff428
|
||||
tokenizers==0.19.1
|
||||
bitsandbytes==0.43.1
|
||||
bitsandbytes==0.43.3
|
||||
accelerate==0.32.0
|
||||
deepspeed==0.14.4
|
||||
pydantic==2.6.3
|
||||
|
||||
Reference in New Issue
Block a user