Abubakar Abid
f2155eaf79
feat: add trackio as experiment tracking integration ( #3253 )
...
* feat: add trackio as experiment tracking integration
- Add TrackioConfig to integrations schema with project_name, run_name, and space_id
- Create trackio_.py module for environment setup
- Add is_trackio_available() utility function
- Integrate trackio with report_to in trainer builder
- Add trackio callback for experiment tracking
- Add trackio config keys to gpt-oss example YAMLs
- Trackio runs locally by default, syncs to HF Space if space_id provided
* changes
* changes
* changes
* changes
* changes
* changes
* changes
* Update requirements.txt
* don't allow pydantic 2.12 for now
---------
Co-authored-by: Abubakar Abid <aaabid93@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-23 08:49:07 -05:00
Wing Lian
0fa752e58b
upgrade flash-attn to 2.8.3 for gpt-oss attn sink support ( #3082 )
2025-08-21 15:04:10 -04:00
Wing Lian
ecbe8b2b61
[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS ( #3073 )
...
* improve fsdp shard merging
* improve logging
* update information on merging and inferencing GPT-OSS
* cleanup readme
* automate cleanup of FSDP prefix
* import GRPO only if necessary
* only modify config.json on rank0
* merge final checkpoint at end of training
* prevent circular import
* Fix saving for sharded state dict
* devx, move merged to output dir
* move import back to top
* Fix stuck merge
* fix conditionals from pr feedback and add test
2025-08-15 21:25:01 -04:00
Wing Lian
50f2b94d50
add 120b and deepspeed zero3 examples ( #3035 ) [skip ci]
...
* add 120b and deepspeed zero3 examples
* add a bit of flavor and cleanup gpt oss readme
* fix: remove expert vram usage
* fix: remove redundant EOS token from eot_tokens
* feat: add 120B to docs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-08 08:04:56 -04:00