* feat: add custom kimi linear patch [skip ci] * feat: add configuration file and fix import [skip ci] * fix: hijack tokenizer temporarily [skip ci] * chore: remove accidental commit * fix: attempt patch kimi remote * fix: kwargs passsed * fix: device for tensor * fix: aux loss calculation * feat: cleaned up patches order * fix: remove duplicate tokenizer patch * chore: add debug logs * chore: add debug logs * chore: debug * Revert "chore: add debug logs" This reverts commitda372a5f67. * Revert "chore: add debug logs" This reverts commit97d1de1d7c. * fix: KeyError: 'tokenization_kimi' * fix: support remote_model_id in cce patch * feat: add config preload patch * fix: use standard aux loss calc and updated modeling * fix: import * feat: add kimi-linear docs and example * chore: add note about moe kernels * feat: update cce to include kimi-linear * chore: lint * chore: update main readme * fix: patch mechanism to address comments * chore: lint * fix: tests * chore: cleanup comment
Finetune MoonshotAI's Kimi Linear with Axolotl
Kimi Linear is a MoE model (48B total, 3B active) by MoonshotAI using a hybrid linear attention architecture to achieve a 1M token context length. It uses Kimi Delta Attention (KDA), a refined version of Gated DeltaNet that reduces KV cache size by up to 75% and boosts decoding throughput by up to 6x for long contexts.
This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.
Note: Axolotl uses experimental training code for Kimi Linear as their original modeling code is inference-only.
Getting started
-
Install Axolotl following the installation guide.
-
Install CCE via docs
-
Run the finetuning example:
axolotl train examples/kimi-linear/kimi-48b-lora.yaml
This config uses about 98.7GiB VRAM.
Let us know how it goes. Happy finetuning!
TIPS
- Kimi Linear requires
trust_remote_code: true. - You can run a full finetuning by removing the
adapter: loraandload_in_8bit: true. - Read more on how to load your own dataset at docs
- The dataset format follows the OpenAI Messages format as seen here
Optimization Guides
See 👉 docs.
Limitations
This is not yet compatible with MoE kernels from transformers v5.