diff --git a/examples/devstral/README.md b/examples/devstral/README.md index b53635a8f..ae0860662 100644 --- a/examples/devstral/README.md +++ b/examples/devstral/README.md @@ -20,7 +20,13 @@ pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0' ``` -2. Run the finetuning example: +2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage + +```bash +python scripts/cutcrossentropy_install.py | sh +``` + +3. Run the finetuning example: ```bash axolotl train examples/devstral/devstral-small-qlora.yml diff --git a/examples/hunyuan/README.md b/examples/hunyuan/README.md new file mode 100644 index 000000000..96c6bbcfa --- /dev/null +++ b/examples/hunyuan/README.md @@ -0,0 +1,85 @@ +# Finetune HunYuan with Axolotl + +Tencent released a family of opensource models called HunYuan with varying parameter scales of 0.5B, 1.8B, 4B, and 7B scale for both Pre-trained and Instruct variants. The models can be found at [HuggingFace](https://huggingface.co/collections/tencent/hunyuan-dense-model-6890632cda26b19119c9c5e7). This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking. + +## Getting started + +1. Install Axolotl following the [installation guide](https://docs.axolotl.ai/docs/installation.html). You need to install from main as HunYuan is only on nightly or use our latest [Docker images](https://docs.axolotl.ai/docs/docker.html). + + Here is an example of how to install from main for pip: + +```bash +# Ensure you have Pytorch installed (Pytorch 2.6.0 min) +git clone https://github.com/axolotl-ai-cloud/axolotl.git +cd axolotl + +pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja +pip3 install --no-build-isolation -e '.[flash-attn]' + +# Install CCE https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy +python scripts/cutcrossentropy_install.py | sh +``` + +2. Run the finetuning example: + +```bash +axolotl train examples/hunyuan/hunyuan-v1-dense-qlora.yaml +``` + +This config uses about 4.7 GB VRAM. + +Let us know how it goes. Happy finetuning! 🚀 + +### Dataset + +HunYuan Instruct models can choose to enter a slow think or fast think pattern. For best performance on fine-tuning their Instruct models, your dataset should be adjusted to match their pattern. + +```python +# fast think pattern +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "/no_think What color is the sun?" }, + {"role": "assistant", "content": "\n\n\n\nThe sun is yellow.\n"} +] + +# slow think pattern +messages = [ + {"role": "system", "content": "You are a helpful assistant."}, + {"role": "user", "content": "/no_think What color is the sun?" }, + {"role": "assistant", "content": "\nThe user is asking about the color of the sun. I need to ...\n\n\nThe sun is yellow.\n"} +] +``` + +### TIPS + +- For inference, the official Tencent team recommends + +```json + +{ + "do_sample": true, + "top_k": 20, + "top_p": 0.8, + "repetition_penalty": 1.05, + "temperature": 0.7 +} + +``` + +- You can run a full finetuning by removing the `adapter: qlora` and `load_in_4bit: true` from the config. +- Read more on how to load your own dataset at [docs](https://docs.axolotl.ai/docs/dataset_loading.html). +- The dataset format follows the OpenAI Messages format as seen [here](https://docs.axolotl.ai/docs/dataset-formats/conversation.html#chat_template). + +## Optimization Guides + +- [Multi-GPU Training](https://docs.axolotl.ai/docs/multi-gpu.html) +- [Multi-Node Training](https://docs.axolotl.ai/docs/multi-node.html) +- [LoRA Optimizations](https://docs.axolotl.ai/docs/lora_optims.html) + +## Related Resources + +- [Tencent HunYuan Blog](https://hunyuan.tencent.com/) +- [Axolotl Docs](https://docs.axolotl.ai) +- [Axolotl Website](https://axolotl.ai) +- [Axolotl GitHub](https://github.com/axolotl-ai-cloud/axolotl) +- [Axolotl Discord](https://discord.gg/7m9sfhzaf3) diff --git a/examples/hunyuan/hunyuan-v1-dense-qlora.yaml b/examples/hunyuan/hunyuan-v1-dense-qlora.yaml new file mode 100644 index 000000000..a94345a61 --- /dev/null +++ b/examples/hunyuan/hunyuan-v1-dense-qlora.yaml @@ -0,0 +1,64 @@ +base_model: tencent/Hunyuan-0.5B-Instruct + +# Automatically upload checkpoint and final model to HF +# hub_model_id: username/custom_model_name + +plugins: + - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin + +load_in_8bit: false +load_in_4bit: true + +datasets: + - path: fozziethebeat/alpaca_messages_2k_test + type: chat_template + +dataset_prepared_path: last_run_prepared +val_set_size: 0.1 +output_dir: ./outputs/lora-out + +adapter: qlora +lora_model_dir: + +sequence_len: 2048 +sample_packing: true + +lora_r: 32 +lora_alpha: 16 +lora_dropout: 0.05 +lora_target_linear: true +lora_target_modules: + - gate_proj + - down_proj + - up_proj + - q_proj + - v_proj + - k_proj + - o_proj + +wandb_project: +wandb_entity: +wandb_watch: +wandb_name: +wandb_log_model: + +gradient_accumulation_steps: 4 +micro_batch_size: 2 +num_epochs: 1 +optimizer: adamw_bnb_8bit +lr_scheduler: cosine +learning_rate: 0.0002 + +bf16: auto +tf32: false + +gradient_checkpointing: true +resume_from_checkpoint: +logging_steps: 1 +flash_attention: true + +warmup_ratio: 0.1 +evals_per_epoch: 1 +saves_per_epoch: 1 + +# save_first_step: true # uncomment this to validate checkpoint saving works with your config diff --git a/examples/magistral/README.md b/examples/magistral/README.md index 48ce712da..f4f278208 100644 --- a/examples/magistral/README.md +++ b/examples/magistral/README.md @@ -18,7 +18,13 @@ pip3 install packaging==23.2 setuptools==75.8.0 wheel ninja pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0' ``` -2. Run the finetuning example: +2. Install [Cut Cross Entropy](https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy) to reduce training VRAM usage + +```bash +python scripts/cutcrossentropy_install.py | sh +``` + +3. Run the finetuning example: ```bash axolotl train examples/magistral/magistral-small-qlora.yaml diff --git a/examples/voxtral/README.md b/examples/voxtral/README.md index f31e9cfd0..984af4ddb 100644 --- a/examples/voxtral/README.md +++ b/examples/voxtral/README.md @@ -22,6 +22,9 @@ pip3 install --no-build-isolation 'axolotl[flash-attn]>=0.12.0' # audio pip3 install librosa==0.11.0 pip3 install 'mistral_common[audio]==1.8.3' + +# Install CCE https://docs.axolotl.ai/docs/custom_integrations.html#cut-cross-entropy +python scripts/cutcrossentropy_install.py | sh ``` 3. Run the finetuning example: diff --git a/src/axolotl/loaders/tokenizer.py b/src/axolotl/loaders/tokenizer.py index dcc255938..37b66ac83 100644 --- a/src/axolotl/loaders/tokenizer.py +++ b/src/axolotl/loaders/tokenizer.py @@ -296,7 +296,7 @@ def load_tokenizer(cfg: DictDefault) -> PreTrainedTokenizer: ) tokenizer.chat_template = chat_template_string - else: + elif getattr(tokenizer, "chat_template", None) is None: LOG.info( "No Chat template selected. Consider adding a chat template for easier inference." ) diff --git a/src/axolotl/monkeypatch/multipack.py b/src/axolotl/monkeypatch/multipack.py index cbc546877..a32430d9f 100644 --- a/src/axolotl/monkeypatch/multipack.py +++ b/src/axolotl/monkeypatch/multipack.py @@ -36,6 +36,10 @@ SUPPORTED_MULTIPACK_MODEL_TYPES = [ "glm", "glm4", "smollm3", + "granite", + "granitemoe", + "hunyuan_v1_dense", + "hunyuan_v1_moe", "gpt_oss", "arcee", "seed_oss",