Files

Wing Lian ecbe8b2b61 [GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 )

* improve fsdp shard merging

* improve logging

* update information on merging and inferencing GPT-OSS

* cleanup readme

* automate cleanup of FSDP prefix

* import GRPO only if necessary

* only modify config.json on rank0

* merge final checkpoint at end of training

* prevent circular import

* Fix saving for sharded state dict

* devx, move merged to output dir

* move import back to top

* Fix stuck merge

* fix conditionals from pr feedback and add test

2025-08-15 21:25:01 -04:00

gpt-oss-20b-fft-deepspeed-zero3.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-fft-fsdp2-offload.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-fft-fsdp2.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-sft-lora-singlegpu.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-120b-fft-fsdp2-offload.yaml

[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 )

2025-08-15 21:25:01 -04:00

README.md

[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 )

2025-08-15 21:25:01 -04:00

README.md

Finetune OpenAI's GPT-OSS with Axolotl

GPT-OSS are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Getting started

Install Axolotl following the installation guide.

Here is an example of how to install from pip:

# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'

Choose one of the following configs below for training the 20B model. (for 120B, see below)

# LoRA SFT linear layers (1x48GB @ ~44GiB)
axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml

# FFT SFT with offloading (2x24GB @ ~21GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml

# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml

Note: Memory usage taken from device_mem_reserved(gib) from logs.

Training 120B

On 8xH100s, make sure you have ~3TB of free disk space. With each checkpoint clocking in at ~720GB, along with the base model, and final model output, you may need at least 3TB of free disk space to keep at least 2 checkpoints.

# FFT SFT with offloading (8x80GB @ ~49GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml

ERRATA: Transformers saves the model Architecture prefixed with FSDP which needs to be manually renamed in config.json. See https://github.com/huggingface/transformers/pull/40207 for the status of this issue.

sed -i 's/FSDPGptOssForCausalLM/GptOssForCausalLM/g' ./outputs/gpt-oss-out/config.json

When using SHARDED_STATE_DICT with FSDP, the final checkpoint should automatically merge the sharded weights to your configured output_dir. However, if that step fails due to a disk space error, you can take an additional step to merge the sharded weights. This step will automatically determine the last checkpoint directory and merge the sharded weights to {output_dir}/merged.

axolotl merge-sharded-fsdp-weights examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
mv ./outputs/gpt-oss-out/merged/* ./outputs/gpt-oss-out/

Inferencing your fine-tuned model

GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425 for more information about using a special vllm-openai docker image for inferencing with vLLM.

SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:

python3 -m sglang.launch_server --model ./outputs/gpt-oss-out/ --served-model-name axolotl/gpt-oss-120b --host 0.0.0.0 --port 8888 --tp 8

Tool use

GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.

Here is an example dataset config:

datasets:
  - path: Nanobit/text-tools-2k-test
    type: chat_template

README.md

Finetune OpenAI's GPT-OSS with Axolotl

Getting started

Training 120B

Inferencing your fine-tuned model

Tool use

TIPS

Optimization Guides

README.md

Finetune OpenAI's GPT-OSS with Axolotl

Getting started

Training 120B

Inferencing your fine-tuned model

Tool use

TIPS

Optimization Guides

Related Resources