Wing Lian
81893c775c
Accelerate 1.8.1 and BNB 0.46.0 update ( #2815 )
...
* update accelerate to v1.8.0
* update bnb also
* fix multigpu ci timeout
* fix test set size
* use latest accelerate 1.8.1
* disable default dtype
2025-06-28 15:29:19 -04:00
Wing Lian
a85efffbef
bump transformers==4.52.4 ( #2800 ) [skip ci]
...
* bump transformers==4.52.4
* don't use hf offline for qwen tokenizer
* increase timeout
* don't use methodtype
* increase timeout
* better assertion logging
* upgrade deepspeed version too
2025-06-18 15:46:14 -04:00
Dan Saunders
1d91d905c9
remove deprecated wandb env var ( #2751 )
...
* remove deprecated wandb env var
* remove os.environ wandb setting; unused loggers
* remove os.environ wandb setting; unused loggers
2025-06-03 14:04:15 -07:00
salman
65c5481120
Rank 0-only logging ( #2608 )
...
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-28 14:57:30 +01:00
Wing Lian
328d598114
gemma3 packing fixes ( #2449 )
...
* make gemma3 work with packing
* multi-gpu e2e for ci
* update gemma3 model namespace to use mirror
* add gradient checkpointing to multigpu e2e ci
* update gemma3 examples for use_reentrant and fix ddp find unused params
* fix tests for gemma3
* fix import for test utils
* set correct train loss for gemma3 e2e
2025-03-31 17:15:23 -04:00