DBRX Model Support (#1462)

* wip for dbrx finetuning

* add fastcore for parallel loading of sharded weights

* fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback

* update to use v2 of the converted model

* more fixes for dbrx loras

* make sure to enable fsdp activation checkpointing

* fix support for 8bit loras too for dbrx

* apply z3 leaf moe fix for DBRX with deepspeed

* don't raise value error since child module searches could fail and be ok

* revert a previous change to fix fsdp

* update mistral/mistral qlora+fsdp yamls

* fix qlora+fsdp quant storage type

* more edge cases for qlora-fsdp

* fixes for fsdp+qlora w optimizer in 8bit

* add bigstral z3 config and make sure to use full_state_dict for fsdp

This commit is contained in:

Wing Lian

2024-04-12 09:02:36 -04:00

committed by

GitHub

parent 5ed29393e3

commit 132eb740f0

19 changed files with 859 additions and 29 deletions

1

requirements.txt

View File

@@ -41,3 +41,4 @@ gcsfs
 trl @ git+https://github.com/huggingface/trl.git@0ee349dcd43b0f4b3169449f16751c38ac4a609f
 zstandard==0.22.0
 fastcore

DBRX Model Support (#1462)

1 requirements.txt Unescape Escape View File

1

requirements.txt

View File