add kernels for gpt oss models (#3020)

* add kernels for gpt oss models * add support for gpt-oss * typo incorrect package * fix: layout for configs and added wandb/epochs * add gptoss example w offload and set moe leaf for z3 * add support for Mxfp4Config from yaml * update yaml to use official model * fix lora and don't allow triton to go above 3.3.1 * fix lr and tweak vram use * fix range for triton since pinned wasn't compatible with toch 2.6.0 * update cce with gpt oss patches --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-08-06 09:47:55 -04:00
parent 97e86c6d47
commit ba3dba3e4f
15 changed files with 257 additions and 11 deletions
--- a/examples/gpt-oss/README.md
+++ b/examples/gpt-oss/README.md
@@ -0,0 +1,9 @@
+# OpenAI's GPT-OSS
+
+GPT-OSS is a 20 billion parameter MoE model trained by OpenAI, released in August 2025.
+
+- 20B Full Parameter SFT can be trained on 8x48GB GPUs (peak reserved memory @ ~36GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2.yaml)
+- 20B LoRA SFT (all linear layers, and experts in last two layers) can be trained a single GPU (peak reserved memory @ ~47GiB)
+  - removing the experts from `lora_target_parameters` will allow the model to fit around ~44GiB of VRAM
+  - [YAML](./gpt-oss-20b-sft-lora-singlegpu.yaml)
+- 20B Full Parameter SFT with FSDP2 offloading can be trained on 2x24GB GPUs (peak reserved memory @ ~21GiB/GPU) - [YAML](./gpt-oss-20b-fft-fsdp2-offload.yaml)