diff --git a/examples/devstral/README.md b/examples/devstral/README.md
index b53635a8f..ae0860662 100644
--- a/examples/devstral/README.md
+++ b/examples/devstral/README.md
@@ -20,7 +20,13 @@ pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
```
-2. Run the finetuning example:
+2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage
+
+```bash
+python scripts/cutcrossentropy_install.py | sh
+```
+
+3. Run the finetuning example:
```bash
axolotl train examples/devstral/devstral-small-qlora.yml
diff --git a/examples/hunyuan/README.md b/examples/hunyuan/README.md
new file mode 100644
index 000000000..96c6bbcfa
--- /dev/null
+++ b/examples/hunyuan/README.md
@@ -0,0 +1,85 @@
+# Finetune HunYuan with Axolotl
+
+Tencent released a family of opensource models called HunYuan with varying parameter scales of 0.5B, 1.8B, 4B, and 7B scale for both Pre-trained and Instruct variants. The models can be found at [HuggingFace](https://huggingface.co/collections/tencent/hunyuan-dense-model-6890632cda26b19119c9c5e7). This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
+
+## Getting started
+
+1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as HunYuan is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html).
+
+ Here is an example of how to install from main for pip:
+
+```bash
+# Ensure you have Pytorch installed (Pytorch 2.6.0 min)
+git clone https://github.com/axolotl-ai-cloud/axolotl.git
+cd axolotl
+
+pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
+pip3 install --no-build-isolation -e '.[flash-attn]'
+
+# Install CCE https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy
+python scripts/cutcrossentropy_install.py | sh
+```
+
+2. Run the finetuning example:
+
+```bash
+axolotl train examples/hunyuan/hunyuan-v1-dense-qlora.yaml
+```
+
+This config uses about 4.7 GB VRAM.
+
+Let us know how it goes. Happy finetuning! 🚀
+
+### Dataset
+
+HunYuan Instruct models can choose to enter a slow think or fast think pattern. For best performance on fine-tuning their Instruct models, your dataset should be adjusted to match their pattern.
+
+```python
+# fast think pattern
+messages = [
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "/no_think What color is the sun?" },
+ {"role": "assistant", "content": "\n\n\n\nThe sun is yellow.\n"}
+]
+
+# slow think pattern
+messages = [
+ {"role": "system", "content": "You are a helpful assistant."},
+ {"role": "user", "content": "/no_think What color is the sun?" },
+ {"role": "assistant", "content": "\nThe user is asking about the color of the sun. I need to ...\n\n\nThe sun is yellow.\n"}
+]
+```
+
+### TIPS
+
+- For inference, the official Tencent team recommends
+
+```json
+
+{
+ "do_sample": true,
+ "top_k": 20,
+ "top_p": 0.8,
+ "repetition_penalty": 1.05,
+ "temperature": 0.7
+}
+
+```
+
+- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config.
+- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html).
+- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template).
+
+## Optimization Guides
+
+- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html)
+- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html)
+- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html)
+
+## Related Resources
+
+- [Tencent HunYuan Blog](https://hunyuan.tencent.com/)
+- [Axolotl Docs](https://docs.axolotl.ai)
+- [Axolotl Website](https://axolotl.ai)
+- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl)
+- [Axolotl Discord](https://discord.gg/7m9sfhzaf3)
diff --git a/examples/hunyuan/hunyuan-v1-dense-qlora.yaml b/examples/hunyuan/hunyuan-v1-dense-qlora.yaml
new file mode 100644
index 000000000..a94345a61
--- /dev/null
+++ b/examples/hunyuan/hunyuan-v1-dense-qlora.yaml
@@ -0,0 +1,64 @@
+base_model: tencent/Hunyuan-0.5B-Instruct
+
+# Automatically upload checkpoint and final model to HF
+# hub_model_id: username/custom_model_name
+
+plugins:
+ - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+
+load_in_8bit: false
+load_in_4bit: true
+
+datasets:
+ - path: fozziethebeat/alpaca_messages_2k_test
+ type: chat_template
+
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.1
+output_dir: ./outputs/lora-out
+
+adapter: qlora
+lora_model_dir:
+
+sequence_len: 2048
+sample_packing: true
+
+lora_r: 32
+lora_alpha: 16
+lora_dropout: 0.05
+lora_target_linear: true
+lora_target_modules:
+ - gate_proj
+ - down_proj
+ - up_proj
+ - q_proj
+ - v_proj
+ - k_proj
+ - o_proj
+
+wandb_project:
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+
+gradient_accumulation_steps: 4
+micro_batch_size: 2
+num_epochs: 1
+optimizer: adamw_bnb_8bit
+lr_scheduler: cosine
+learning_rate: 0.0002
+
+bf16: auto
+tf32: false
+
+gradient_checkpointing: true
+resume_from_checkpoint:
+logging_steps: 1
+flash_attention: true
+
+warmup_ratio: 0.1
+evals_per_epoch: 1
+saves_per_epoch: 1
+
+# save_first_step: true # uncomment this to validate checkpoint saving works with your config
diff --git a/examples/magistral/README.md b/examples/magistral/README.md
index 48ce712da..f4f278208 100644
--- a/examples/magistral/README.md
+++ b/examples/magistral/README.md
@@ -18,7 +18,13 @@ pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja
pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
```
-2. Run the finetuning example:
+2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage
+
+```bash
+python scripts/cutcrossentropy_install.py | sh
+```
+
+3. Run the finetuning example:
```bash
axolotl train examples/magistral/magistral-small-qlora.yaml
diff --git a/examples/voxtral/README.md b/examples/voxtral/README.md
index f31e9cfd0..984af4ddb 100644
--- a/examples/voxtral/README.md
+++ b/examples/voxtral/README.md
@@ -22,6 +22,9 @@ pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0'
# audio
pip3 install librosa==0.11.0
pip3 install 'mistral_common[audio]==1.8.3'
+
+# Install CCE https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy
+python scripts/cutcrossentropy_install.py | sh
```
3. Run the finetuning example:
diff --git a/src/axolotl/loaders/tokenizer.py b/src/axolotl/loaders/tokenizer.py
index dcc255938..37b66ac83 100644
--- a/src/axolotl/loaders/tokenizer.py
+++ b/src/axolotl/loaders/tokenizer.py
@@ -296,7 +296,7 @@ def load_tokenizer(cfg: DictDefault) -> PreTrainedTokenizer:
)
tokenizer.chat_template = chat_template_string
- else:
+ elif getattr(tokenizer, "chat_template", None) is None:
LOG.info(
"No Chat template selected. Consider adding a chat template for easier inference."
)
diff --git a/src/axolotl/monkeypatch/multipack.py b/src/axolotl/monkeypatch/multipack.py
index cbc546877..a32430d9f 100644
--- a/src/axolotl/monkeypatch/multipack.py
+++ b/src/axolotl/monkeypatch/multipack.py
@@ -36,6 +36,10 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [
"glm",
"glm4",
"smollm3",
+ "granite",
+ "granitemoe",
+ "hunyuan_v1_dense",
+ "hunyuan_v1_moe",
"gpt_oss",
"arcee",
"seed_oss",