* add 120b and deepspeed zero3 examples * add a bit of flavor and cleanup gpt oss readme * fix: remove expert vram usage * fix: remove redundant EOS token from eot_tokens * feat: add 120B to docs --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>
Finetune OpenAI's GPT-OSS with Axolotl
GPT-OSS are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B.
This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
Getting started
-
Install Axolotl following the installation guide.
Here is an example of how to install from pip:
# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
- Choose one of the following configs below for training the 20B model. (for 120B, see below)
# LoRA SFT linear layers (1x48GB @ ~44GiB)
axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml
# FFT SFT with offloading (2x24GB @ ~21GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml
# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml
Note: Memory usage taken from device_mem_reserved(gib) from logs.
Training 120B
On 8xH100s
# FFT SFT with offloading (8x80GB @ ~49GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml
Tool use
GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.
Here is an example dataset config:
datasets:
- path: Nanobit/text-tools-2k-test
type: chat_template
See Nanobit/text-tools-2k-test for the sample dataset.
Refer to our docs for more info.
TIPS
- Read more on how to load your own dataset at docs.
- The dataset format follows the OpenAI Messages format as seen here.