Wing Lian
|
e207762928
|
fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg (#2956) [skip ci]
* fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg
* replace the rest of the migrated deepspeed params
|
2025-07-21 11:41:31 -04:00 |
|
Seungduk Kim
|
b0ee9ec734
|
Set gradient_clipping to auto in DeepSpeed configs (#1382) [skip ci]
|
2024-03-10 20:50:12 -04:00 |
|
Wing Lian
|
e923e62d24
|
more checks and fixes for deepspeed and fsdp (#1208) [skip ci]
|
2024-01-25 20:01:45 -05:00 |
|
Wing Lian
|
54d2ac155b
|
Mixtral fixes 20240124 (#1192) [skip ci]
* mixtral nccl fixes
* make sure to patch for z3
|
2024-01-24 14:59:57 -05:00 |
|