Files

Wing Lian 50f2b94d50 add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

* add 120b and deepspeed zero3 examples

* add a bit of flavor and cleanup gpt oss readme

* fix: remove expert vram usage

* fix: remove redundant EOS token from eot_tokens

* feat: add 120B to docs

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>

2025-08-08 08:04:56 -04:00

gpt-oss-20b-fft-deepspeed-zero3.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-fft-fsdp2-offload.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-fft-fsdp2.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-20b-sft-lora-singlegpu.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

gpt-oss-120b-fft-fsdp2-offload.yaml

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

README.md

add 120b and deepspeed zero3 examples (#3035 ) [skip ci]

2025-08-08 08:04:56 -04:00

README.md

Finetune OpenAI's GPT-OSS with Axolotl

GPT-OSS are a family of open-weight MoE models trained by OpenAI, released in August 2025. There are two variants: 20B and 120B.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Getting started

Install Axolotl following the installation guide.

Here is an example of how to install from pip:

# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'

Choose one of the following configs below for training the 20B model. (for 120B, see below)

# LoRA SFT linear layers (1x48GB @ ~44GiB)
axolotl train examples/gpt-oss/gpt-oss-20b-sft-lora-singlegpu.yaml

# FFT SFT with offloading (2x24GB @ ~21GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2-offload.yaml

# FFT SFT (8x48GB @ ~36GiB/GPU or 4x80GB @ ~46GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-20b-fft-fsdp2.yaml

Note: Memory usage taken from device_mem_reserved(gib) from logs.

Training 120B

On 8xH100s

# FFT SFT with offloading (8x80GB @ ~49GiB/GPU)
axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml

Tool use

GPT-OSS has a comprehensive tool understanding. Axolotl supports tool calling datasets for Supervised Fine-tuning.

Here is an example dataset config:

datasets:
  - path: Nanobit/text-tools-2k-test
    type: chat_template

README.md

Finetune OpenAI's GPT-OSS with Axolotl

Getting started

Training 120B

Tool use

TIPS

Optimization Guides

README.md

Finetune OpenAI's GPT-OSS with Axolotl

Getting started

Training 120B

Tool use

TIPS

Optimization Guides

Related Resources