Files
axolotl/src/axolotl/integrations/cut_cross_entropy/README.md
NanoCode012 006f226270 Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275)
* feat: update cce to include olmo family

* chore: update docs following feedback

* feat: add olmo3 config

* fix: clarify 3 methods

* chore: add olmo to readme
2025-11-24 10:21:31 +07:00

1.7 KiB

Cut Cross Entropy

Cut Cross Entropy (CCE) reduces VRAM usage through optimization on the cross-entropy operation during loss calculation.

See https://github.com/apple/ml-cross-entropy

Requirements

  • PyTorch 2.4.0 or higher

Installation

Run the following command to install cut_cross_entropy[transformers] if you don't have it already.

  • If you are in dev environment
python scripts/cutcrossentropy_install.py | sh
  • If you are installing from pip
pip3 uninstall -y cut-cross-entropy && pip3 install "cut-cross-entropy[transformers] @ git+https://github.com/axolotl-ai-cloud/ml-cross-entropy.git@5eff953"

Usage

plugins:
  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin

Supported Models

  • apertus
  • arcee
  • cohere
  • cohere2
  • deepseek_v3
  • gemma
  • gemma2
  • gemma3
  • gemma3_text
  • gemma3n
  • gemma3n_text
  • glm
  • glm4
  • glm4_moe
  • glm4v
  • glm4v_moe
  • gpt_oss
  • granite
  • granitemoe
  • granitemoeshared
  • granitemoehybrid
  • hunyuan_v1_dense
  • hunyuan_v1_moe
  • lfm2
  • lfm2_moe
  • lfm2_vl
  • llama
  • llama4
  • llama4_text
  • llava
  • mistral
  • mistral3
  • mixtral
  • mllama
  • olmo
  • olmo2
  • olmo3
  • phi
  • phi3
  • phi4_multimodal
  • qwen2
  • qwen2_vl
  • qwen2_moe
  • qwen2_5_vl
  • qwen3
  • qwen3_moe
  • qwen3_vl
  • qwen3_vl_moe
  • qwen3_next
  • smollm3
  • seed_oss
  • voxtral

Citation

@article{wijmans2024cut,
  author       = {Erik Wijmans and
                  Brody Huval and
                  Alexander Hertzberg and
                  Vladlen Koltun and
                  Philipp Kr\"ahenb\"uhl},
  title        = {Cut Your Losses in Large-Vocabulary Language Models},
  journal      = {arXiv},
  year         = {2024},
  url          = {https://arxiv.org/abs/2411.09009},
}