From 05cedbfb1e8a125adcfaa0a03d1b9b2a3fa97e80 Mon Sep 17 00:00:00 2001 From: Wing Lian Date: Tue, 19 Aug 2025 13:30:37 -0400 Subject: [PATCH] add baseten info for gpt-oss recipe (#3078) * add bsaeten info for gpt-oss recipe * incorporate PR review --- examples/gpt-oss/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/examples/gpt-oss/README.md b/examples/gpt-oss/README.md index 9db5e9887..98f3ea892 100644 --- a/examples/gpt-oss/README.md +++ b/examples/gpt-oss/README.md @@ -41,6 +41,12 @@ model, and final model output, you may need at least 3TB of free disk space to k axolotl train examples/gpt-oss/gpt-oss-120b-fft-fsdp2-offload.yaml ``` +To simplify fine-tuning across 2 nodes × 8x H100 (80GB) GPUs, we've partnered with [Baseten](https://baseten.co) to showcase multi-node +training of the 120B model using Baseten Truss. You can read more about this recipe on +[Baseten's blog](https://www.baseten.co/blog/how-to-fine-tune-gpt-oss-120b-with-baseten-and-axolotl/). The recipe can +be found on their +[GitHub](https://github.com/basetenlabs/ml-cookbook/tree/main/examples/oss-gpt-120b-axolotl/training). + ERRATA: Transformers saves the model Architecture prefixed with `FSDP` which needs to be manually renamed in `config.json`. See https://github.com/huggingface/transformers/pull/40207 for the status of this issue.