* improve fsdp shard merging
* improve logging
* update information on merging and inferencing GPT-OSS
* cleanup readme
* automate cleanup of FSDP prefix
* import GRPO only if necessary
* only modify config.json on rank0
* merge final checkpoint at end of training
* prevent circular import
* Fix saving for sharded state dict
* devx, move merged to output dir
* move import back to top
* Fix stuck merge
* fix conditionals from pr feedback and add test
* fix for parallelism config from trainer
* fix handling of parallelism_config w accelerate
* add todo for removal
* update to latest axolotl-contribs-mit for optimizer fix too
* synchronize training after checkpoint save
* dir spelling
* use latest accelerate main
* fix to not use partial state parallelism_config
* more fixeS
* use most recent accelerate fix
* fix cpu_ram_efficient_loading to meta devices from rank 0 to prevent CPU RAM oom
* improve handling of broadcasting fsdp2 state dict
* support for openai chat template with thinking key as the reasoning trace
* address PR feedback
* refactor to remove dependency on PartialState for parallelism config
* bump accelerate, gptoss fixes
* limit meta fixes to fsdp2 for now
* fixes for gpt oss
* fixup examples, don't use cpu-ram-efficient-loading for now
* remove problematic barrier
* patch parallelism config
* reorder comparison
* device mesh fixes
* make pure CP work
* lint
* add kernels for gpt oss models
* add support for gpt-oss
* typo incorrect package
* fix: layout for configs and added wandb/epochs
* add gptoss example w offload and set moe leaf for z3
* add support for Mxfp4Config from yaml
* update yaml to use official model
* fix lora and don't allow triton to go above 3.3.1
* fix lr and tweak vram use
* fix range for triton since pinned wasn't compatible with toch 2.6.0
* update cce with gpt oss patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai>