Files

NanoCode012 cf0c79d52e fix: minor patches for multimodal (#2441 )

* fix: update chat_template

* fix: handle gemma3 showing a lot of no content for turn 0

* fix: remove unknown config from examples

* fix: test

* fix: temporary disable gemma2 test

* fix: stop overwriting config.text_config unnecessarily

* fix: handling of set cache to the text_config section

* feat: add liger gemma support and bump liger to 0.5.5

* fix: add double use_cache setting

* fix: add support for final_logit_softcap in CCE for gemma2/3

* fix: set use_cache before model load

* feat: add missing layernorm override

* fix: handle gemma3 rmsnorm

* fix: use wrapper to pass dim as hidden_size

* fix: change dim to positional

* fix: patch with wrong mlp

* chore: refactor use_cache handling

* fix import issues

* fix tests.e2e.utils import

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>

2025-03-31 13:40:12 +07:00

1.2 KiB

Raw Blame History

Liger Kernel Integration

Liger Kernel provides efficient Triton kernels for LLM training, offering:

20% increase in multi-GPU training throughput
60% reduction in memory usage
Compatibility with both FSDP and DeepSpeed

See https://github.com/linkedin/Liger-Kernel

Usage

plugins:
  - axolotl.integrations.liger.LigerPlugin
liger_rope: true
liger_rms_norm: true
liger_glu_activation: true
liger_layer_norm: true
liger_fused_linear_cross_entropy: true

Supported Models

deepseek_v2
gemma
gemma2
gemma3 (partial support, no support for FLCE yet)
granite
jamba
llama
mistral
mixtral
mllama
mllama_text_model
olmo2
paligemma
phi3
qwen2
qwen2_5_vl
qwen2_vl

Citation

@article{hsu2024ligerkernelefficienttriton,
      title={Liger Kernel: Efficient Triton Kernels for LLM Training},
      author={Pin-Lun Hsu and Yun Dai and Vignesh Kothapalli and Qingquan Song and Shao Tang and Siyu Zhu and Steven Shimizu and Shivam Sahni and Haowen Ning and Yanning Chen},
      year={2024},
      eprint={2410.10989},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2410.10989},
      journal={arXiv preprint arXiv:2410.10989},
}

1.2 KiB Raw Blame History

Liger Kernel Integration

Usage

Supported Models

Citation

1.2 KiB

Raw Blame History