diff --git a/README.md b/README.md
index c86ab8f4a..1517fb874 100644
--- a/README.md
+++ b/README.md
@@ -29,6 +29,7 @@
 
 ## 🎉 Latest Updates
 
+- 2025/11: Axolotl now includes support for [Olmo3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/olmo3).
 - 2025/10: New model support has been added in Axolotl for: [Qwen3 Next](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/qwen3-next), [Qwen2.5-vl, Qwen3-vl](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen2_5-vl), [Qwen3, Qwen3MoE](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3), [Granite 4](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/granite4), [HunYuan](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/hunyuan), [Magistral 2509](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral#vision), [Apertus](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/apertus), and [Seed-OSS](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/seed-oss).
 - 2025/09: Axolotl now has text diffusion training. Read more [here](https://github.com/axolotl-ai-cloud/axolotl/tree/main/src/axolotl/integrations/diffusion).
 - 2025/08: QAT has been updated to include NVFP4 support. See [PR](https://github.com/axolotl-ai-cloud/axolotl/pull/3107).
diff --git a/docs/multi-gpu.qmd b/docs/multi-gpu.qmd
index 57a941b04..1b58a108c 100644
--- a/docs/multi-gpu.qmd
+++ b/docs/multi-gpu.qmd
@@ -4,7 +4,7 @@ format:
   html:
     toc: true
     toc-depth: 3
-    number-sections: true
+    # number-sections: true
     code-tools: true
 execute:
   enabled: false
@@ -14,12 +14,18 @@ This guide covers advanced training configurations for multi-GPU setups using Ax
 
 ## Overview {#sec-overview}
 
-Axolotl supports several methods for multi-GPU training:
+When training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy.
 
-- DeepSpeed (recommended)
-- FSDP (Fully Sharded Data Parallel)
-- Sequence parallelism
-- FSDP + QLoRA
+You generally cannot combine these strategies; they are mutually exclusive.
+
+1.  **DeepSpeed**: Powerful optimization library, supports ZeRO stages 1-3.
+2.  **FSDP (Fully Sharded Data Parallel)**: PyTorch's native sharding implementation (Recommended).
+3.  **DDP (Distributed Data Parallel)**: PyTorch's native parallelism implementation (Default if neither of the above are selected).
+
+These features can often be combined with the strategies above:
+
+*   **Sequence Parallelism**: Splits long sequences across GPUs (Compatible with DDP, DeepSpeed, and FSDP).
+*   **FSDP + QLoRA**: Combines 4-bit quantization with FSDP (Specific to FSDP).
 
 ## DeepSpeed {#sec-deepspeed}
 
@@ -65,12 +71,18 @@ Start from Stage 1 -> Stage 2 -> Stage 3.
 
 ## Fully Sharded Data Parallel (FSDP) {#sec-fsdp}
 
+FSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers.
+
 ::: {.callout-note}
 
 FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl.
 
 :::
 
+### FSDP + QLoRA {#sec-fsdp-qlora}
+
+For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
+
 ### Migrating from FSDP1 to FSDP2 {#sec-migrate-fsdp1-fsdp2}
 
 To migrate your config from FSDP1 to FSDP2, you must use the `fsdp_version` top-level config field to specify the FSDP version, and
@@ -145,10 +157,6 @@ single sequence causes OOM errors during model training.
 
 See our [dedicated guide](sequence_parallelism.qmd) for more information.
 
-### FSDP + QLoRA {#sec-fsdp-qlora}
-
-For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd).
-
 ## Performance Optimization {#sec-performance}
 
 ### Liger Kernel Integration {#sec-liger}
diff --git a/examples/colab-notebooks/colab-axolotl-example.ipynb b/examples/colab-notebooks/colab-axolotl-example.ipynb
index cea1aeda0..57a638948 100644
--- a/examples/colab-notebooks/colab-axolotl-example.ipynb
+++ b/examples/colab-notebooks/colab-axolotl-example.ipynb
@@ -40,7 +40,7 @@
     "%%capture\n",
     "# This step can take ~5-10 minutes to install dependencies\n",
     "!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n",
-    "!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec\""
+    "!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953\""
    ]
   },
   {
diff --git a/examples/olmo3/README.md b/examples/olmo3/README.md
new file mode 100644
index 000000000..d4dbe05a9
--- /dev/null
+++ b/examples/olmo3/README.md
@@ -0,0 +1,46 @@
+# Finetune Allenai's Olmo 3 with Axolotl
+
+[Olmo 3](https://huggingface.co/collections/allenai/olmo-3) are a family of 7B and 32B models open source models trained by The Allen Institute for Artificial Intelligence.
+
+This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
+
+## Getting started
+
+1.  Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
+
+    Here is an example of how to install from pip:
+    ```bash
+    # Ensure you have a compatible version of Pytorch installed
+    pip3 install packaging setuptools wheel ninja
+    pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
+
+    # Install Cut Cross Entropy
+    python scripts/cutcrossentropy_install.py | sh
+    ```
+
+2. Run the finetuning example:
+
+```bash
+axolotl train examples/olmo3/olmo3-7b-qlora.yaml
+```
+
+Let us know how it goes. Happy finetuning! 🚀
+
+### TIPS
+
+- The example config can be re-used for Olmo and Olmo 2.
+- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
+- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
+- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
+
+## Optimization Guides
+
+Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).
+
+## Related Resources
+
+- [Olmo 3 Blog](https://allenai.org/blog/olmo3)
+- [Axolotl Docs](https://docs.axolotl.ai)
+- [Axolotl Website](https://axolotl.ai)
+- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
+- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
diff --git a/examples/olmo3/olmo3-7b-qlora.yaml b/examples/olmo3/olmo3-7b-qlora.yaml
new file mode 100644
index 000000000..c8878d79f
--- /dev/null
+++ b/examples/olmo3/olmo3-7b-qlora.yaml
@@ -0,0 +1,64 @@
+base_model: allenai/Olmo-3-7B-Instruct-SFT
+
+# Automatically upload checkpoint and final model to HF
+# hub_model_id: username/custom_model_name
+
+plugins:
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+
+load_in_8bit: false
+load_in_4bit: true
+
+datasets:
+  - path: fozziethebeat/alpaca_messages_2k_test
+    type: chat_template
+
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.1
+output_dir: ./outputs/lora-out
+
+adapter: qlora
+lora_model_dir:
+
+sequence_len: 2048
+sample_packing: true
+
+lora_r: 32
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+lora_target_modules:
+  - gate_proj
+  - down_proj
+  - up_proj
+  - q_proj
+  - v_proj
+  - k_proj
+  - o_proj
+
+wandb_project:
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+
+gradient_accumulation_steps: 4
+micro_batch_size: 2
+num_epochs: 1
+optimizer: adamw_bnb_8bit
+lr_scheduler: cosine
+learning_rate: 0.0002
+
+bf16: auto
+tf32: false
+
+gradient_checkpointing: true
+resume_from_checkpoint:
+logging_steps: 1
+flash_attention: true
+
+warmup_ratio: 0.1
+evals_per_epoch: 1
+saves_per_epoch: 1
+
+# save_first_step: true  # uncomment this to validate checkpoint saving works with your config
diff --git a/examples/seed-oss/README.md b/examples/seed-oss/README.md
index 5610c1316..aeb8635e3 100644
--- a/examples/seed-oss/README.md
+++ b/examples/seed-oss/README.md
@@ -6,21 +6,17 @@ This guide shows how to fine-tune it with Axolotl with multi-turn conversations
 
 ## Getting started
 
-1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as Seed-OSS is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
+1.  Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html).
 
-    Here is an example of how to install from main for pip:
+    Here is an example of how to install from pip:
+    ```bash
+    # Ensure you have a compatible version of Pytorch installed
+    pip3 install packaging setuptools wheel ninja
+    pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
 
-```bash
-# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
-git clone https://github.com/axolotl-ai-cloud/axolotl.git
-cd axolotl
-
-pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
-pip3 install --no-build-isolation -e '.[flash-attn]'
-
-# Install Cut Cross Entropy
-python scripts/cutcrossentropy_install.py | sh
-```
+    # Install Cut Cross Entropy
+    python scripts/cutcrossentropy_install.py | sh
+    ```
 
 2. Run the finetuning example:
 
@@ -41,9 +37,7 @@ Let us know how it goes. Happy finetuning! 🚀
 
 ## Optimization Guides
 
-- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
-- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
-- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
+Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).
 
 ## Related Resources
 
diff --git a/examples/smolvlm2/README.md b/examples/smolvlm2/README.md
index 9c0ae4836..74c1a1c0f 100644
--- a/examples/smolvlm2/README.md
+++ b/examples/smolvlm2/README.md
@@ -37,9 +37,7 @@ This guide shows how to fine-tune SmolVLM2 models with Axolotl.
 
 ## Optimization Guides
 
-- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
-- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
-- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
+Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html).
 
 ## Related Resources
 
diff --git a/scripts/cutcrossentropy_install.py b/scripts/cutcrossentropy_install.py
index cb498c002..91d0f45d6 100644
--- a/scripts/cutcrossentropy_install.py
+++ b/scripts/cutcrossentropy_install.py
@@ -29,5 +29,5 @@ UV_PREFIX = "uv " if USE_UV else ""
 
 print(
     UNINSTALL_PREFIX
-    + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"'
+    + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"'
 )
diff --git a/src/axolotl/integrations/cut_cross_entropy/README.md b/src/axolotl/integrations/cut_cross_entropy/README.md
index 5c7c5166b..4f98ac089 100644
--- a/src/axolotl/integrations/cut_cross_entropy/README.md
+++ b/src/axolotl/integrations/cut_cross_entropy/README.md
@@ -19,7 +19,7 @@ python scripts/cutcrossentropy_install.py | sh
 
 - If you are installing from pip
 ```bash
-pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"
+pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"
 ```
 
 ## Usage
@@ -65,6 +65,9 @@ plugins:
 - mistral3
 - mixtral
 - mllama
+- olmo
+- olmo2
+- olmo3
 - phi
 - phi3
 - phi4_multimodal
diff --git a/src/axolotl/integrations/cut_cross_entropy/__init__.py b/src/axolotl/integrations/cut_cross_entropy/__init__.py
index bd0124b93..b8f7e9da3 100644
--- a/src/axolotl/integrations/cut_cross_entropy/__init__.py
+++ b/src/axolotl/integrations/cut_cross_entropy/__init__.py
@@ -35,7 +35,7 @@ LOG = get_logger(__name__)
 
 _CCE_INSTALL_MESSAGE = (
     "Please install Axolotl's fork of cut_cross_entropy with transformers support using "
-    '`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"`'
+    '`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"`'
 )
 
 
diff --git a/src/axolotl/monkeypatch/multipack.py b/src/axolotl/monkeypatch/multipack.py
index 5d34f1935..fdda3c3bc 100644
--- a/src/axolotl/monkeypatch/multipack.py
+++ b/src/axolotl/monkeypatch/multipack.py
@@ -49,6 +49,9 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [
     "seed_oss",
     "lfm2",
     "lfm2_moe",
+    "olmo",
+    "olmo2",
+    "olmo3",
 ]