* improve fsdp shard merging
* improve logging
* update information on merging and inferencing GPT-OSS
* cleanup readme
* automate cleanup of FSDP prefix
* import GRPO only if necessary
* only modify config.json on rank0
* merge final checkpoint at end of training
* prevent circular import
* Fix saving for sharded state dict
* devx, move merged to output dir
* move import back to top
* Fix stuck merge
* fix conditionals from pr feedback and add test
* add kernels for gpt oss models
* add support for gpt-oss
* typo incorrect package
* fix: layout for configs and added wandb/epochs
* add gptoss example w offload and set moe leaf for z3
* add support for Mxfp4Config from yaml
* update yaml to use official model
* fix lora and don't allow triton to go above 3.3.1
* fix lr and tweak vram use
* fix range for triton since pinned wasn't compatible with toch 2.6.0
* update cce with gpt oss patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>