axolotl

Author	SHA1	Message	Date
NanoCode012	631268a0ca	revert renaming of deepspeed stage3 args that use auto (#2964 ) [skip ci] * Revert "fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg…" This reverts commit `e207762928`. * don't revert the values that don't use 'auto' --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-22 09:59:47 -04:00
Wing Lian	e207762928	fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg (#2956 ) [skip ci] * fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg * replace the rest of the migrated deepspeed params	2025-07-21 11:41:31 -04:00
Wing Lian	d3c45d27b5	fix zero3 (#1994 )	2024-10-28 07:32:49 -04:00
Wing Lian	132eb740f0	DBRX Model Support (#1462 ) * wip for dbrx finetuning * add fastcore for parallel loading of sharded weights * fix dtype for load, use PartialState instead of accelerator to init process group, remove redundant wandb callback * update to use v2 of the converted model * more fixes for dbrx loras * make sure to enable fsdp activation checkpointing * fix support for 8bit loras too for dbrx * apply z3 leaf moe fix for DBRX with deepspeed * don't raise value error since child module searches could fail and be ok * revert a previous change to fix fsdp * update mistral/mistral qlora+fsdp yamls * fix qlora+fsdp quant storage type * more edge cases for qlora-fsdp * fixes for fsdp+qlora w optimizer in 8bit * add bigstral z3 config and make sure to use full_state_dict for fsdp	2024-04-12 09:02:36 -04:00
NanoCode012	946b497c3f	feat: add deepspeed 3 with cpuoffload (#1466 ) * feat: add deepspeed 3 with cpuoffload * make bf16 explicit, add param only offload variant --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-04-01 21:42:52 +09:00

5 Commits