DBRX Model Support (#1462)
* wip for dbrx finetuning * add fastcore for parallel loading of sharded weights * fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback * update to use v2 of the converted model * more fixes for dbrx loras * make sure to enable fsdp activation checkpointing * fix support for 8bit loras too for dbrx * apply z3 leaf moe fix for DBRX with deepspeed * don't raise value error since child module searches could fail and be ok * revert a previous change to fix fsdp * update mistral/mistral qlora+fsdp yamls * fix qlora+fsdp quant storage type * more edge cases for qlora-fsdp * fixes for fsdp+qlora w optimizer in 8bit * add bigstral z3 config and make sure to use full_state_dict for fsdp
This commit is contained in:
@@ -1,4 +1,6 @@
|
||||
{
|
||||
"zero_force_ds_cpu_optimizer": false,
|
||||
"zero_allow_untested_optimizer": true,
|
||||
"zero_optimization": {
|
||||
"stage": 3,
|
||||
"offload_param": {
|
||||
|
||||
Reference in New Issue
Block a user