NanoCode012
149178ddb7
chore: cleanup post release v0.16 ( #3577 )
...
* fix: remove unneeded debug log
* fix: cleanup
* feat: add dense gemma config and cleanup
* feat: add cce support
* update notes and set torch compile
* fix patch for new number of return vals
* fixes for gemma4
* fix packing bug
* use updated cce for mm
* fix: pass in kv cache func when avail for transformers 5.5
* feat: update examples with flex variant and readme
* gemma4 lora attention kernels
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-04-06 10:10:52 -07:00
Wing Lian
1fc86d5295
Scattermoe LoRA optimizations ( #3513 )
...
* optimize moe + lora
* more scattermoe optims
* selective dequant
* add correctness unit tests and benchmarks for scattermoe + lora
* handle base+lora split kernel for older moe models
* chore: lint
* fix casting for H200 and B200
* register pressure estimation and pruning for h200/b200
* use soft limit for pruning
* qkv patch for qwen3.5moe
* support text_model for qwen3.5 moe
* nesting of qwen3
* use udpated cce with zero3 support
* Fix decomposed backward for QKV and O projections
eliminates B @ A materialization in LoRA attention backward, replacing full [out, in] matmuls with two small [T, R] matmuls.
2026-03-19 23:07:42 -04:00
NanoCode012
a098df527b
feat: add Mistral Small 4 ( #3502 )
...
* feat: add mistral small 4
* fix: update mistral common
* fix: deepcopy when passing in tokenizer
* feat: add doc on reasoning and thinking section
* fix: don't use custom tokenizer and quantize experts
* chore: update docs and configs
* chore: update doc to follow official name
* feat: update cce to include mistral4
* chore: move
* fix: naming
* fix: test mock breaking get_text_config check
* fix: enable CCE and add expert block targetting to configs
* chore: docs
* fix: use act checkpointing
* chore: doc
* chore: docs
* chore: docs
2026-03-17 09:39:05 +07:00
Wing Lian
fc2d63ee5f
use new tf32 APIs for torch 2.9+ ( #3467 ) [skip ci]
...
* use new tf32 APIs for torch 2.9+
* also upgrade cce for tf32 fixes and lint
2026-03-06 11:40:32 -05:00
NanoCode012
945c8aeb10
Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes ( #3439 )
...
* fix: saving clones state dict
* fix: apply fix for only CP mode
* fix: add dropout check when using lora target param
* fix: re-add patch from transformers PR #39866
* feat: add moe quant to test by ved
* fix: try match target param properly end with
* fix: clear cache per param quant
* fix: attempt on-load quantize experts instead of post-load
* fix: attempt disable async load
* chore: add log
* chore: adjust log
* fix: remove cuda alloc for moe and enable async load
* chore: remove leftover logs
* chore: add extra empty cache
* fix(doc): clarify support
* fix: handle fsdp2 for paramwrapper dtensor
* feat: attempt to quant experts in 8bit mode too
* feat: attempt to release bf16 experts from vram
* feat: upgrade cce
* fix: fsdp2 init_sharded_param load int8/uint4 dtensor as
require_grad=true on init
* fix: remove unnecessary gc and empty cache
* Revert "fix: remove unnecessary gc and empty cache"
This reverts commit 1d54518990 .
* fix: do not call full_tensor on non-dtensors
* fix: attempt to address fsdp2 with quant exp high loss
* fix: attempt lora quant experts wrong dim
* fix: ensure require_grad patch applied for lora 8bit
* fix: attempt lora 8bit fsdp2
* fix: attribute access on save for lora 8bit fsdp2
* fix: wrong weight attrib access
* chore(refactor): add config, re-arrange position of patches, clean
comments
* feat: add example docs
* chore: cherry pick trinity fixes from PR 3399
* chore: comments refactor; add guards
* fix: guard using wrong key
* fix: mamba save does not accept main process param
* fix: guard prevent double hook
* fix: move gc to upper scope
* chore: add comment on proxy forward patch
* fix: add comment to clarify
* feat: add test idempotency
* fix: AttributeError: `e_score_correction_bias` is not an nn.Parameter
* fix: AttributeError: 'NoneType' object has no attribute 'to'
* fix: update docs on cpu_ram_efficient_loading
2026-03-03 10:06:23 -05:00
NanoCode012
43d60c7439
bump cut-cross-entropy to 58d6572 ( #3424 )
2026-02-20 14:24:51 -05:00
NanoCode012
b6d3653f74
feat: add step3p5 for cce ( #3384 ) [skip ci]
...
* feat: add step3p5 for cce
* chore: reorder model
2026-02-10 17:51:43 +07:00
NanoCode012
3dd86d35b8
feat: add new cce support for glm series and exaone4 ( #3373 ) [skip ci]
2026-01-28 06:44:44 -05:00
NanoCode012
418933f0d1
feat: add internvl3_5 ( #3141 ) [skip-ci]
...
* feat: add internvl3_5
* fix: add timm instructions
* chore: add kimi-linear to cce doc
* feat: update internvl example
* chore: pin revision
* chore: remove from multipack
* fix: add to multimodal array
* fix: internvl use hf version
* feat: update cce
* chore: lint
* fix: list for image_size
* chore: add docs vram usage
* feat: enable cce
* fix: no need trust remote code
* fix: inconsistent timm version
2025-12-25 18:07:59 +07:00
NanoCode012
97f1b1758d
Feat: add kimi linear support ( #3257 )
...
* feat: add custom kimi linear patch [skip ci]
* feat: add configuration file and fix import [skip ci]
* fix: hijack tokenizer temporarily [skip ci]
* chore: remove accidental commit
* fix: attempt patch kimi remote
* fix: kwargs passsed
* fix: device for tensor
* fix: aux loss calculation
* feat: cleaned up patches order
* fix: remove duplicate tokenizer patch
* chore: add debug logs
* chore: add debug logs
* chore: debug
* Revert "chore: add debug logs"
This reverts commit da372a5f67 .
* Revert "chore: add debug logs"
This reverts commit 97d1de1d7c .
* fix: KeyError: 'tokenization_kimi'
* fix: support remote_model_id in cce patch
* feat: add config preload patch
* fix: use standard aux loss calc and updated modeling
* fix: import
* feat: add kimi-linear docs and example
* chore: add note about moe kernels
* feat: update cce to include kimi-linear
* chore: lint
* chore: update main readme
* fix: patch mechanism to address comments
* chore: lint
* fix: tests
* chore: cleanup comment
2025-12-25 17:53:52 +07:00
NanoCode012
a1d07f42e4
Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate ( #3313 )
...
* fix: leftover ministral docs changes
* fix: pytorch_cuda_alloc_conf deprecation
* fix: set old PYTORCH_CUDA_ALLOC_CONF env too
* handle 2.9 separately
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 09:12:18 -05:00
NanoCode012
2b66ee189c
Feat: add ministral3 ( #3297 )
...
* feat: add ministral and mistral3
* chore: lint
* feat: update cce for ministral
* fix: add vram usage
* feat: update for release
* fix: save_pretrained issue in v5
* fix: add instructions to use v5 branch
* fix: add to multipack
* fix: improve instructions
* fix: add model to readme
2025-12-04 08:32:08 -05:00
NanoCode012
006f226270
Feat: add Olmo3 (BC with Olmo and Olmo2) ( #3275 )
...
* feat: update cce to include olmo family
* chore: update docs following feedback
* feat: add olmo3 config
* fix: clarify 3 methods
* chore: add olmo to readme
2025-11-24 10:21:31 +07:00
NanoCode012
8c7f63cf97
fix: unpack cce imported incorrectly ( #3212 ) [skip ci]
2025-10-13 17:19:15 +07:00
NanoCode012
ab63b92c38
feat: add lfm2 family and latest moe model ( #3208 )
...
* feat: add lfm2 family and latest moe model
* fix: use ml-cross-entropy for lfm2 examples
2025-10-09 10:47:41 -04:00
NanoCode012
7fa8ac40cd
Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches ( #3178 )
...
* feat: upgrade cce with patches for transformers 4.56
* feat: add missing models to cce readme
2025-09-26 12:11:29 +07:00
NanoCode012
08d831c3d5
Feat: add qwen3-next (w packing+cce) ( #3150 )
...
* feat: upgrade cce for qwen3-next
* feat: add sample qwen3 config
* feat: add packing patch for chunk_gated_delta_rule
* feat: add qwen3 link
* fix: tuple name
* feat: add tested qwen3 config
* fix: improve log
* feat: add patch for fla without packing
* fix: remove fla patch for standard mode
* feat: enable packing
* feat: add qwen3-next tests
* chore: move tests
2025-09-23 11:31:15 +07:00
NanoCode012
c51d6b06c3
feat: add apertus model and cce ( #3144 ) [skip ci]
...
* feat: add apertus, glm4v, glm4v_moe cce
* fix: arcee docs
* feat: add apertus
* feat: added vram usage
* fix: add apertus note
* feat: update doc on apertus xielu
* fix: add monkeypatch for xielu activation issue
* fix: simplify env
* feat: pin commit
* feat: add packing
* chore: move patch calling
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* Update examples/apertus/README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-09-19 17:34:04 +07:00
Dan Saunders
4065bc14c6
Debug log, logging improvements ( #3159 )
...
* simplify logging
* remove comment
* progress on debug.log
* add debug-level logger for file log
* simplify
* case insensitivity; 3rd party logging improvements
* simplify
* fix
* tests
* lint
* nits
* nit
* Update tests/test_utils_tee.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* cleanup / comments
* fix
* oops
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2025-09-17 13:27:03 -04:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
Wing Lian
6afba3871d
Add support for PyTorch 2.8.0 ( #3106 )
...
* Add support for PyTorch 2.8.0
* loosen triton requirements
* handle torch 2.8.0 in setup.py
* fix versions
* no vllm for torch 2.8.0
* remove comment
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-28 09:10:40 -04:00
Dan Saunders
79ddaebe9a
Add ruff, remove black, isort, flake8, pylint ( #3092 )
...
* black, isort, flake8 -> ruff
* remove unused
* add back needed import
* fix
2025-08-23 23:37:33 -04:00
NanoCode012
f70d4de8c7
feat(doc): add links to new features on README ( #2980 ) [skip ci]
...
* feat(doc): add links to new features on README
* fix merge error
* remove blurb about older FSDP2 integration
* update blog link
* chore: update cce commit
* feat: update model support into readme
* Update README.md
Co-authored-by: salman <salman.mohammadi@outlook.com >
* chore: lint num spaces
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-08-08 08:16:43 -04:00
NanoCode012
2974670bf8
Feat: add arcee ( #3028 )
...
* feat: add arcee
* feat: add latest models supported by cce
* feat: add arcee example config
* chore: lint
* fix: typo
* feat: change to instruct
* feat: add vram usage
* Update README.md
2025-08-08 08:09:11 -04:00
Wing Lian
ba3dba3e4f
add kernels for gpt oss models ( #3020 )
...
* add kernels for gpt oss models
* add support for gpt-oss
* typo incorrect package
* fix: layout for configs and added wandb/epochs
* add gptoss example w offload and set moe leaf for z3
* add support for Mxfp4Config from yaml
* update yaml to use official model
* fix lora and don't allow triton to go above 3.3.1
* fix lr and tweak vram use
* fix range for triton since pinned wasn't compatible with toch 2.6.0
* update cce with gpt oss patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-06 09:47:55 -04:00
Wing Lian
01a6bd1a0e
use CCE fix for TP using vocab parallel for CEL ( #3000 )
2025-08-01 13:21:58 -04:00
NanoCode012
eb0a8a7775
feat: upgrade cce commit to include smollm3, granite, granitemoe ( #2993 )
2025-07-31 18:18:44 -04:00
NanoCode012
90e5598930
Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes ( #2979 )
...
* fix: lock version in gemma3n docs
* feat: add sample configs and docs
* chore: move mistraltokenizer into mistral folder
* feat: update instructions
* feat: add dynamic load voxtral
* fix: remove incorrect vision config, add audio
* fix: support voxtral processing strategy and address none in data
* feat: patch mistraltokenizer subclass upstream and add missing
* feat: update cce commit to include voxtral
* fix: remove old comment
* fix: gemma3 patch not needed anymore
* fix: voxtral modeling code
* fix: remove incorrect ds path
* fix: adjust apply chat template parsing
* feat: enable voxtral patch
* fix: patch
* feat: update example datasets
* fix: target layer
* feat: update gemma3n docs
* feat: update voxtral docs
* feat: revert assistant parsing to rely on new upstream changes
* chore: skip test till next PR fix
* fix: override upstream decode due to missing handling
* feat: update readme
* fix: update
* feat: add magistral small think support
* feat: update mistral-common dep
* fix: lint
* fix: remove optional dep
* chore: typing
* chore: simply import
* feat(doc): update differences for 2507
* fix: coderrabbit comments
* feat: update clarify docs on new transformers
2025-07-30 15:57:05 +07:00
Wing Lian
b7e8f66e5a
upstream fixes in cce for dora and tensor paralel support ( #2960 ) [skip ci]
2025-07-21 11:41:53 -04:00
Wing Lian
942005f526
use modal==1.0.2 for nightlies and for cli ( #2925 ) [skip ci]
...
* use modal==1.0.2 for nightlies and for cli
* use latest cce fork for upstream changes
* increase timeout
2025-07-15 20:31:23 -04:00
NanoCode012
29289a4de9
feat: replace old colab notebook with newer one ( #2838 ) [skip ci]
...
* feat: replace old colab notebook with newer one
* fix: point to update cce fork
2025-06-27 10:35:47 -04:00
Wing Lian
d009ead101
fix build w pyproject to respect insalled torch version ( #2168 )
...
* fix build w pyproject to respect insalled torch version
* include in manifest
* disable duplicate code check for now
* move parser so it can be found
* add checks for correct pytorch version so this doesn't slip by again
2024-12-10 16:25:25 -05:00
Sunny Liu
45c0825587
updated colab notebook ( #2074 )
...
* updated colab notebook
* update pip installtation
* cleared cell output
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* modified notebook
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* cleared cell output
* cleared unnecessary logs
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2024-11-22 10:09:10 -05:00
Wing Lian
2d7830fda6
upgrade to flash-attn 2.7.0 ( #2048 )
2024-11-14 06:59:25 -05:00
Sri Kainkaryam
203816f7b4
Fix colab example notebook ( #1805 ) [skip ci]
2024-08-04 13:24:26 -04:00
Oliver Klingefjord
18abdb447a
typo ( #1685 ) [skip ci]
...
* typo
* typo 2
---------
Co-authored-by: mhenrichsen <mads.gade.henrichsen@live.dk >
2024-07-12 21:24:01 -04:00
mhenrichsen
1194c2e0b1
github urls ( #1734 )
...
Co-authored-by: Henrichsen, Mads (ext) <mads.henrichsen.ext@siemens-energy.com >
2024-07-11 09:19:29 -04:00
Maciek
5f91064040
Fix Google Colab notebook 2024-05 ( #1662 ) [skip ci]
...
* include mlflow installation in the colab notebook
Without explicitly installing mlflow the `accelerate launch` command fails.
* update the colab noteboko to use the latest tinyllama config
2024-05-28 11:23:52 -04:00
Wing Lian
4fde300e5f
update outputs path so that we can mount workspace to /workspace/data ( #1623 )
...
* update outputs path so that we can mount workspace to /workspace/data
* fix ln order
2024-05-15 12:44:13 -04:00
Jared Palmer
6ab69ec5f8
Add instructions for playing with qlora model to colab example ( #1290 )
...
* Add instructions for playing with qlora model to colab example
* Update examples/colab-notebooks/colab-axolotl-example.ipynb
Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com >
---------
Co-authored-by: NanoCode012 <kevinvong@rocketmail.com >
Co-authored-by: JohanWork <39947546+JohanWork@users.noreply.github.com >
2024-02-22 02:46:27 +09:00
JohanWork
1c7ed26785
lock pytorch ( #1247 ) [skip ci]
2024-02-06 07:48:26 -05:00
JohanWork
ee0b5f60e5
add colab example ( #1196 ) [skip ci]
2024-01-24 20:09:09 -05:00