Files
axolotl/examples/kimi-linear
NanoCode012 97f1b1758d Feat: add kimi linear support (#3257)
* feat: add custom kimi linear patch [skip ci]

* feat: add configuration file and fix import [skip ci]

* fix: hijack tokenizer temporarily [skip ci]

* chore: remove accidental commit

* fix: attempt patch kimi remote

* fix: kwargs passsed

* fix: device for tensor

* fix: aux loss calculation

* feat: cleaned up patches order

* fix: remove duplicate tokenizer patch

* chore: add debug logs

* chore: add debug logs

* chore: debug

* Revert "chore: add debug logs"

This reverts commit da372a5f67.

* Revert "chore: add debug logs"

This reverts commit 97d1de1d7c.

* fix: KeyError: 'tokenization_kimi'

* fix: support remote_model_id in cce patch

* feat: add config preload patch

* fix: use standard aux loss calc and updated modeling

* fix: import

* feat: add kimi-linear docs and example

* chore: add note about moe kernels

* feat: update cce to include kimi-linear

* chore: lint

* chore: update main readme

* fix: patch mechanism to address comments

* chore: lint

* fix: tests

* chore: cleanup comment
2025-12-25 17:53:52 +07:00
..

Finetune MoonshotAI's Kimi Linear with Axolotl

Kimi Linear is a MoE model (48B total, 3B active) by MoonshotAI using a hybrid linear attention architecture to achieve a 1M token context length. It uses Kimi Delta Attention (KDA), a refined version of Gated DeltaNet that reduces KV cache size by up to 75% and boosts decoding throughput by up to 6x for long contexts.

This guide shows how to fine-tune it with Axolotl with multi-turn conversations and proper masking.

Note: Axolotl uses experimental training code for Kimi Linear as their original modeling code is inference-only.

Getting started

  1. Install Axolotl following the installation guide.

  2. Install CCE via docs

  3. Run the finetuning example:

    axolotl train examples/kimi-linear/kimi-48b-lora.yaml
    

This config uses about 98.7GiB VRAM.

Let us know how it goes. Happy finetuning!

TIPS

  • Kimi Linear requires trust_remote_code: true.
  • You can run a full finetuning by removing the adapter: lora and load_in_8bit: true.
  • Read more on how to load your own dataset at docs
  • The dataset format follows the OpenAI Messages format as seen here

Optimization Guides

See 👉 docs.

Limitations

This is not yet compatible with MoE kernels from transformers v5.