Files

NanoCode012 fcfc13d710 feat(doc): update thinking and chat_template notes (#3114 ) [skip ci]

* feat: update thinking and chat_template notes

* fix: grammar

2025-09-12 14:45:18 +07:00

gpt-oss-20b-fft-deepspeed-zero3.yaml

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )

2025-08-21 15:04:10 -04:00

gpt-oss-20b-fft-fsdp2-offload.yaml

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )

2025-08-21 15:04:10 -04:00

gpt-oss-20b-fft-fsdp2.yaml

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )

2025-08-21 15:04:10 -04:00

gpt-oss-20b-sft-lora-singlegpu.yaml

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )

2025-08-21 15:04:10 -04:00

gpt-oss-120b-fft-fsdp2-offload.yaml

upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )

2025-08-21 15:04:10 -04:00

README.md

feat(doc): update thinking and chat_template notes (#3114 ) [skip ci]

2025-09-12 14:45:18 +07:00

README.md

Finetune OpenAI's GPT-OSS with Axolotl

GPT-OSS are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Getting started

Install Axolotl following the installation guide.

Here is an example of how to install from pip:

# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'

Choose one of the following configs below for training the 20B model. (for 120B, see below)

# LoRA SFT linear layers (1x48GB @ ~44GiB)
axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml

# FFT SFT with offloading (2x24GB @ ~21GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml

# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml

Note: Memory usage taken from device_mem_reserved(gib) from logs.

Training 120B

On 8xH100s, make sure you have ~3TB of free disk space. With each checkpoint clocking in at ~720GB, along with the base model, and final model output, you may need at least 3TB of free disk space to keep at least 2 checkpoints.

# FFT SFT with offloading (8x80GB @ ~49GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml

To simplify fine-tuning across 2 nodes × 8x H100 (80GB) GPUs, we've partnered with Baseten to showcase multi-node training of the 120B model using Baseten Truss. You can read more about this recipe on Baseten's blog. The recipe can be found on their GitHub.

ERRATA: Transformers saves the model Architecture prefixed with FSDP which needs to be manually renamed in config.json. See https://github.com/huggingface/transformers/pull/40207 for the status of this issue.

sed -i 's/FSDPGptOssForCausalLM/GptOssForCausalLM/g' ./outputs/gpt-oss-out/config.json

When using SHARDED_STATE_DICT with FSDP, the final checkpoint should automatically merge the sharded weights to your configured output_dir. However, if that step fails due to a disk space error, you can take an additional step to merge the sharded weights. This step will automatically determine the last checkpoint directory and merge the sharded weights to {output_dir}/merged.

axolotl merge-sharded-fsdp-weights examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
mv ./outputs/gpt-oss-out/merged/* ./outputs/gpt-oss-out/

Inferencing your fine-tuned model

vLLM

GPT-OSS support in vLLM does not exist in a stable release yet. See https://x.com/MaziyarPanahi/status/1955741905515323425 for more information about using a special vllm-openai docker image for inferencing with vLLM.

Optionally, vLLM can be installed from nightly:

pip install --no-build-isolation --pre -U vllm --extra-index-url https://wheels.vllm.ai/nightly

and the vLLM server can be started with the following command (modify --tensor-parallel-size 8 to match your environment):

vllm serve ./outputs/gpt-oss-out/ --served-model-name axolotl/gpt-oss-20b --host 0.0.0.0 --port 8888  --tensor-parallel-size 8

SGLang

SGLang has 0-day support in main, see https://github.com/sgl-project/sglang/issues/8833 for infomation on installing SGLang from source. Once you've installed SGLang, run the following command to launch a SGLang server:

python3 -m sglang.launch_server --model ./outputs/gpt-oss-out/ --served-model-name axolotl/gpt-oss-120b --host 0.0.0.0 --port 8888 --tp 8

Tool use

GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.

Here is an example dataset config:

datasets:
  - path: Nanobit/text-tools-2k-test
    type: chat_template

See Nanobit/text-tools-2k-test for the sample dataset.

Refer to our docs for more info.

Thinking and chat_template masking conflict

OpenAI’s Harmony template hides thinking in all non-final turns, which conflicts with Axolotl’s chat_template masking.

If your dataset has thinking content mid-turn, there are two paths we recommend:

Train only on the last turn. This can be accomplished via chat_template's train on last doc.
Adjust your dataset to only have thinking content in the last turn.

TIPS

Read more on how to load your own dataset at docs.
The dataset format follows the OpenAI Messages format as seen here.

README.md Unescape Escape

Finetune OpenAI's GPT-OSS with Axolotl

Getting started

Training 120B

Inferencing your fine-tuned model

vLLM

SGLang

Tool use

Thinking and chat_template masking conflict

TIPS

Optimization Guides

Related Resources

README.md