diff --git a/README.md b/README.md index c86ab8f4a..1517fb874 100644 --- a/README.md +++ b/README.md @@ -29,6 +29,7 @@ ## 🎉 Latest Updates +- 2025/11: Axolotl now includes support for [Olmo3](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/olmo3). - 2025/10: New model support has been added in Axolotl for: [Qwen3 Next](https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/qwen3-next), [Qwen2.5-vl, Qwen3-vl](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen2_5-vl), [Qwen3, Qwen3MoE](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/qwen3), [Granite 4](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/granite4), [HunYuan](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/hunyuan), [Magistral 2509](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/magistral#vision), [Apertus](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/apertus), and [Seed-OSS](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples/seed-oss). - 2025/09: Axolotl now has text diffusion training. Read more [here](https://github.com/axolotl-ai-cloud/axolotl/tree/main/src/axolotl/integrations/diffusion). - 2025/08: QAT has been updated to include NVFP4 support. See [PR](https://github.com/axolotl-ai-cloud/axolotl/pull/3107). diff --git a/docs/multi-gpu.qmd b/docs/multi-gpu.qmd index 57a941b04..1b58a108c 100644 --- a/docs/multi-gpu.qmd +++ b/docs/multi-gpu.qmd @@ -4,7 +4,7 @@ format: html: toc: true toc-depth: 3 - number-sections: true + # number-sections: true code-tools: true execute: enabled: false @@ -14,12 +14,18 @@ This guide covers advanced training configurations for multi-GPU setups using Ax ## Overview {#sec-overview} -Axolotl supports several methods for multi-GPU training: +When training on multiple GPUs, Axolotl supports 3 sharding/parallelism strategies. Additionally, you can layer specific optimization features on top of that strategy. -- DeepSpeed (recommended) -- FSDP (Fully Sharded Data Parallel) -- Sequence parallelism -- FSDP + QLoRA +You generally cannot combine these strategies; they are mutually exclusive. + +1. **DeepSpeed**: Powerful optimization library, supports ZeRO stages 1-3. +2. **FSDP (Fully Sharded Data Parallel)**: PyTorch's native sharding implementation (Recommended). +3. **DDP (Distributed Data Parallel)**: PyTorch's native parallelism implementation (Default if neither of the above are selected). + +These features can often be combined with the strategies above: + +* **Sequence Parallelism**: Splits long sequences across GPUs (Compatible with DDP, DeepSpeed, and FSDP). +* **FSDP + QLoRA**: Combines 4-bit quantization with FSDP (Specific to FSDP). ## DeepSpeed {#sec-deepspeed} @@ -65,12 +71,18 @@ Start from Stage 1 -> Stage 2 -> Stage 3. ## Fully Sharded Data Parallel (FSDP) {#sec-fsdp} +FSDP allows you to shard model parameters, gradients, and optimizer states across data parallel workers. + ::: {.callout-note} FSDP2 is recommended for new users. FSDP1 is deprecated and will be removed in an upcoming release of Axolotl. ::: +### FSDP + QLoRA {#sec-fsdp-qlora} + +For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). + ### Migrating from FSDP1 to FSDP2 {#sec-migrate-fsdp1-fsdp2} To migrate your config from FSDP1 to FSDP2, you must use the `fsdp_version` top-level config field to specify the FSDP version, and @@ -145,10 +157,6 @@ single sequence causes OOM errors during model training. See our [dedicated guide](sequence_parallelism.qmd) for more information. -### FSDP + QLoRA {#sec-fsdp-qlora} - -For combining FSDP with QLoRA, see our [dedicated guide](fsdp_qlora.qmd). - ## Performance Optimization {#sec-performance} ### Liger Kernel Integration {#sec-liger} diff --git a/examples/colab-notebooks/colab-axolotl-example.ipynb b/examples/colab-notebooks/colab-axolotl-example.ipynb index cea1aeda0..57a638948 100644 --- a/examples/colab-notebooks/colab-axolotl-example.ipynb +++ b/examples/colab-notebooks/colab-axolotl-example.ipynb @@ -40,7 +40,7 @@ "%%capture\n", "# This step can take ~5-10 minutes to install dependencies\n", "!pip install --no-build-isolation axolotl[flash-attn]>=0.9.1\n", - "!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec\"" + "!pip install \"cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953\"" ] }, { diff --git a/examples/olmo3/README.md b/examples/olmo3/README.md new file mode 100644 index 000000000..d4dbe05a9 --- /dev/null +++ b/examples/olmo3/README.md @@ -0,0 +1,46 @@ +# Finetune Allenai's Olmo 3 with Axolotl + +[Olmo 3](https://huggingface.co/collections/allenai/olmo-3) are a family of 7B and 32B models open source models trained by The Allen Institute for Artificial Intelligence. + +This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. + +## Getting started + +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). + + Here is an example of how to install from pip: + ```bash + # Ensure you have a compatible version of Pytorch installed + pip3 install packaging setuptools wheel ninja + pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0' + + # Install Cut Cross Entropy + python scripts/cutcrossentropy_install.py | sh + ``` + +2. Run the finetuning example: + +```bash +axolotl train examples/olmo3/olmo3-7b-qlora.yaml +``` + +Let us know how it goes. Happy finetuning! 🚀 + +### TIPS + +- The example config can be re-used for Olmo and Olmo 2. +- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config. +- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). +- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template). + +## Optimization Guides + +Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html). + +## Related Resources + +- [Olmo 3 Blog](https://allenai.org/blog/olmo3) +- [Axolotl Docs](https://docs.axolotl.ai) +- [Axolotl Website](https://axolotl.ai) +- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl) +- [Axolotl Discord](https://discord.gg/7m9sfhzaf3) diff --git a/examples/olmo3/olmo3-7b-qlora.yaml b/examples/olmo3/olmo3-7b-qlora.yaml new file mode 100644 index 000000000..c8878d79f --- /dev/null +++ b/examples/olmo3/olmo3-7b-qlora.yaml @@ -0,0 +1,64 @@ +base_model: allenai/Olmo-3-7B-Instruct-SFT + +# Automatically upload checkpoint and final model to HF +# hub_model_id: username/custom_model_name + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin + +load_in_8bit: false +load_in_4bit: true + +datasets: + - path: fozziethebeat/alpaca_messages_2k_test + type: chat_template + +dataset_prepared_path: last_run_prepared +val_set_size: 0.1 +output_dir: ./outputs/lora-out + +adapter: qlora +lora_model_dir: + +sequence_len: 2048 +sample_packing: true + +lora_r: 32 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_linear: true +lora_target_modules: + - gate_proj + - down_proj + - up_proj + - q_proj + - v_proj + - k_proj + - o_proj + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: + +gradient_accumulation_steps: 4 +micro_batch_size: 2 +num_epochs: 1 +optimizer: adamw_bnb_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: false + +gradient_checkpointing: true +resume_from_checkpoint: +logging_steps: 1 +flash_attention: true + +warmup_ratio: 0.1 +evals_per_epoch: 1 +saves_per_epoch: 1 + +# save_first_step: true # uncomment this to validate checkpoint saving works with your config diff --git a/examples/seed-oss/README.md b/examples/seed-oss/README.md index 5610c1316..aeb8635e3 100644 --- a/examples/seed-oss/README.md +++ b/examples/seed-oss/README.md @@ -6,21 +6,17 @@ This guide shows how to fine-tune it with Axolotl with multi-turn conversations ## Getting started -1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as Seed-OSS is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). - Here is an example of how to install from main for pip: + Here is an example of how to install from pip: + ```bash + # Ensure you have a compatible version of Pytorch installed + pip3 install packaging setuptools wheel ninja + pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0' -```bash -# Ensure you have Pytorch installed (Pytorch 2.6.0 min) -git clone https://github.com/axolotl-ai-cloud/axolotl.git -cd axolotl - -pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja -pip3 install --no-build-isolation -e '.[flash-attn]' - -# Install Cut Cross Entropy -python scripts/cutcrossentropy_install.py | sh -``` + # Install Cut Cross Entropy + python scripts/cutcrossentropy_install.py | sh + ``` 2. Run the finetuning example: @@ -41,9 +37,7 @@ Let us know how it goes. Happy finetuning! 🚀 ## Optimization Guides -- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) -- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) -- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html) +Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html). ## Related Resources diff --git a/examples/smolvlm2/README.md b/examples/smolvlm2/README.md index 9c0ae4836..74c1a1c0f 100644 --- a/examples/smolvlm2/README.md +++ b/examples/smolvlm2/README.md @@ -37,9 +37,7 @@ This guide shows how to fine-tune SmolVLM2 models with Axolotl. ## Optimization Guides -- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) -- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html) -- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) +Please check the [Optimizations doc](https://docs.axolotl.ai/docs/optimizations.html). ## Related Resources diff --git a/scripts/cutcrossentropy_install.py b/scripts/cutcrossentropy_install.py index cb498c002..91d0f45d6 100644 --- a/scripts/cutcrossentropy_install.py +++ b/scripts/cutcrossentropy_install.py @@ -29,5 +29,5 @@ UV_PREFIX = "uv " if USE_UV else "" print( UNINSTALL_PREFIX - + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"' + + f'{UV_PREFIX}pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"' ) diff --git a/src/axolotl/integrations/cut_cross_entropy/README.md b/src/axolotl/integrations/cut_cross_entropy/README.md index 5c7c5166b..4f98ac089 100644 --- a/src/axolotl/integrations/cut_cross_entropy/README.md +++ b/src/axolotl/integrations/cut_cross_entropy/README.md @@ -19,7 +19,7 @@ python scripts/cutcrossentropy_install.py | sh - If you are installing from pip ```bash -pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec" +pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953" ``` ## Usage @@ -65,6 +65,9 @@ plugins: - mistral3 - mixtral - mllama +- olmo +- olmo2 +- olmo3 - phi - phi3 - phi4_multimodal diff --git a/src/axolotl/integrations/cut_cross_entropy/__init__.py b/src/axolotl/integrations/cut_cross_entropy/__init__.py index bd0124b93..b8f7e9da3 100644 --- a/src/axolotl/integrations/cut_cross_entropy/__init__.py +++ b/src/axolotl/integrations/cut_cross_entropy/__init__.py @@ -35,7 +35,7 @@ LOG = get_logger(__name__) _CCE_INSTALL_MESSAGE = ( "Please install Axolotl's fork of cut_cross_entropy with transformers support using " - '`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@8a1a0ec"`' + '`pip install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"`' ) diff --git a/src/axolotl/monkeypatch/multipack.py b/src/axolotl/monkeypatch/multipack.py index 5d34f1935..fdda3c3bc 100644 --- a/src/axolotl/monkeypatch/multipack.py +++ b/src/axolotl/monkeypatch/multipack.py @@ -49,6 +49,9 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [ "seed_oss", "lfm2", "lfm2_moe", + "olmo", + "olmo2", + "olmo3", ]