Commit Graph

1197 Commits

Author SHA1 Message Date
NanoCode012
8c6a6ea6eb Feat: add devstral model support (#2880) [skip ci]
* fix: do not add training and training_detail block by default

* fixed: magistral docs

* fix: address pad adding new fields and use built-in from_openai

* feat: try enable multiprocessing

* fix: check for keys before deleting attn_mask

* feat: add mistral pad test

* feat: add tool calling test

* feat: add devstral tokenizer tests

* fix: comma format

* chore: remove unused support_preprocessing as tokenizer is pickable now

* chore: update magistral doc

* feat: add devstral readme and example

* chore: refactor error handling
2025-07-08 11:01:19 -04:00
NanoCode012
78bff4925e fix: set add_generation_prompt to False when apply chat template (#2859) [skip ci] 2025-07-08 11:00:44 -04:00
NanoCode012
b237c8a3f3 chore: update cce commit to include gemma3n fixes (#2881) [skip ci] 2025-07-08 10:59:35 -04:00
Wing Lian
d68cc1e8ab densemixer plugin integration (#2868)
* densemixer plugin integration

* update readme with usage docs

* automatically find new integrations that aren't explicitly defined

* make sure to import os
2025-07-07 17:05:19 -04:00
Wing Lian
9c0d7ee761 TiledMLP support (#2865) 2025-07-07 15:23:49 -04:00
Wing Lian
a108e5db56 use latest version of cce fork for SP fix (#2871) [skip ci]
* use latest version of cce fork for SP fix

* latest sha to handle older transformers
2025-07-07 13:05:11 -04:00
Wing Lian
faff0cff41 manage jinja templates as nicely formatted files (#2795)
* manage jinja templates as nicely formatted files

* chore: lint

* use path for templates relative to the module

* fix template reformating

* handle newlines in llama3 template

* fix gemma3 jinja

* fix templates

* suport for passing jinja template file in yaml

* handle file loading of jinja template outside of validation

* fix typing and typo
2025-07-07 10:11:48 -04:00
Wing Lian
759cefb741 setup defaults for dataloader to ensure GPU is kept busy (#2632) [skip ci] 2025-07-07 10:10:58 -04:00
Wing Lian
69cd49a7aa update transformers to 4.53.1 (#2844) [skip ci]
* update transformers to 4.53.0

* remove attention_mask from signature columns if using packing

* remove attention_mask column from dataloader

* update signature of flash attn forward for ring attn patch

* fix FSDP

* patch ring-flash-attn with upstream signature fix

* fix patch indentation level

* fix the patch

* add batch flattening smoke test with loss check that works in older transformers

* fix patch

* don't drop attention mask for flex

* more fixes

* patch create_causal_mask for packing w flex

* global torch manual_seed fixture

* tweak loss checks

* fix patch and use single batch for flex

* don't need to reload

* fix causal mask patch

* use transformers patch releasE

* make sure env var is string

* make sure to drop attention mask for flex w packing for latest transformers patch release

* tweak loss

* guard on signature columns before removing attention mask

* bump loss

* set remove isn't chainable

* skip slow mistral test in 2.5.1
2025-07-07 09:35:22 -04:00
NanoCode012
5a961ecadf Fix: do not call preprocess in multimodal or pretraining case (#2861)
* fix: let users know to not call preprocess for vision mode

* fix: improve ux for pretraining dataset and skip prepare ds

* feat: add info to doc

* Update src/axolotl/cli/preprocess.py following comment

Co-authored-by: salman <salman.mohammadi@outlook.com>

---------

Co-authored-by: salman <salman.mohammadi@outlook.com>
2025-07-06 21:55:33 -04:00
Wing Lian
b37ddf9778 don't use tokenizer parallelism when using packing (#2862) [skip ci] 2025-07-06 21:55:09 -04:00
Wing Lian
bf38e507fb respect shuffle_merged_datasets for single dataset too (#2866) [skip ci]
* respect shuffle_merged_datasets for single dataset too

* update inline comment for behavior

Co-authored-by: NanoCode012 <nano@axolotl.ai>

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2025-07-06 21:20:41 -04:00
Wing Lian
70ca1b2291 fix nightlies to use correct cache (#2848) [skip ci]
* fix nightlies to use correct cache

* fix for handling None for bf16
2025-07-03 12:21:39 -04:00
NanoCode012
8ae5a2311b feat: update handling for mistraltokenizer decode and multiprocessing pickling fix (#2790)
* feat: update handling for mistraltokenizer decode

* fix: update mistral common package version

* fix: to use correct release

* fix triton path

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-07-02 08:07:18 -04:00
NanoCode012
6383630155 Fix: tokenize stall due to not shuffling dataset (#2845)
* fix: shuffle dataset even if only one to fix tokenize stall

* fix: warn if shuffling merged with curriculum sampling

* chore: refactor
2025-07-02 08:06:00 -04:00
Vincenzo di Cicco
f2b352f2e5 Add sample_packing_sequentially to trainer args (#2853) [skip ci] 2025-07-02 08:05:35 -04:00
Dhruv Mullick
d1224db8f4 Decouple generate_during_eval from wandb to support other visualizers (#2849) [skip ci]
* Add generate_during_eval for mlflow for dpo

* Decouple generate_during_eval from wandb
2025-07-02 08:04:40 -04:00
Dan Saunders
35fdbce102 Ensure device mesh patching is applied (#2842)
* move patches; make patch stronger

* fix broken tests

* guard sequence_parallel_degree comparison against none

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-29 22:16:32 -04:00
Wing Lian
81893c775c Accelerate 1.8.1 and BNB 0.46.0 update (#2815)
* update accelerate to v1.8.0

* update bnb also

* fix multigpu ci timeout

* fix test set size

* use latest accelerate 1.8.1

* disable default dtype
2025-06-28 15:29:19 -04:00
Wing Lian
a1a740608d add assertion for packing patch to _get_unpad_data (#2840) 2025-06-27 11:20:23 -04:00
kallewoof
ec15a7a691 Support --lora-on-cpu flag for DPO model merging (#2766) [skip ci]
* Support --lora-on-cpu flag for DPO model merging

* fix: use device=cpu in _convert_embedding_modules_dtype when lora_on_cpu is set
2025-06-27 11:19:24 -04:00
Wing Lian
0a7a216b60 allow for different sequence_len for evaluations (#2836) [skip ci]
* allow for different sequence_len for evaluations

* reversed 🤦

* add more information to filter msg
2025-06-27 11:02:51 -04:00
NanoCode012
d8280d45c1 feat: add chat_template kwargs (#2837) 2025-06-27 10:38:46 -04:00
Wing Lian
24f2887e87 don't fail during preprocess for sampling from iterable dataset (#2825) [skip ci] 2025-06-27 10:37:53 -04:00
Wing Lian
a24957fa04 fix for iterable datasets and pickling (#2831) [skip ci]
* fix for iterable datasets and pickling

* more fixes for pretraining

* can't pickle mock generator dataset
2025-06-27 10:35:23 -04:00
Wing Lian
d8cf66edbd use fork for multiprocess start method for packing in parallel (#2830) 2025-06-25 13:17:33 -04:00
NanoCode012
181cc3106b fix: catch httperror from ratelimiting hf when checking user token (#2827) 2025-06-25 09:50:13 -04:00
NanoCode012
20106116da fix: 'NoneType' object has no attribute 'column_names' (#2822) [skip ci]
* fix: 'NoneType' object has no attribute 'column_names'

* chore: typing
2025-06-25 09:49:55 -04:00
Younes B
a27c4f8771 feat: add falcon-h1 into axolotl (#2811) [skip ci]
* feat: add falcon-h1 into axolotl

* fix pre-commit

* review

* fix: remove packing
2025-06-25 09:49:42 -04:00
NanoCode012
bb1109b81d feat: update CCE to use axolotl's fork (#2813) [skip ci]
* feat: update CCE to use axolotl's fork

* chore: improve error message

* feat: add eot token for gemma3 configs

* fix: only warn on more than 1 image

* fix: re-add gemma3 patch

* Revert "fix: re-add gemma3 patch"

This reverts commit f04db5e873.

* feat: add qwen25 vl example

* feat: point to upstream fork cce package

* feat: update cce commit
2025-06-25 09:49:22 -04:00
Dan Saunders
8c69ec3a1e gating _gather_outputs (causes increased vram usage) (#2829)
* SP vram fix

* gating _gather_outputs (causes increased vram usage)

* reverting unneeded change
2025-06-25 08:33:55 -04:00
Dan Saunders
46675496a3 log config (#2819)
* log config

* moving text art; adding sensitive value redaction + sorting

* revert pre-commit changes

* remove none-valued config before dumping

* just redact api keys
2025-06-24 14:59:30 -04:00
NanoCode012
c6b5d35e5d fix: re-add gemma3 patch (#2817) 2025-06-24 10:51:30 +07:00
Wing Lian
12c826816d chunked cross entropy loss (#2625)
* chunked cross entropy loss

* refactor so we can add test

* use relative import

* update schema description
2025-06-23 23:08:46 -04:00
Dan Saunders
1d8f500709 deepspeed fix (#2820) 2025-06-23 09:07:57 -04:00
Dan Saunders
45adf1bfb9 get_logger use_environ fix (#2808)
* get_logger use_environ fix

* rethinking

* replacing old logger imports

* simplify

* fix boolean cond
2025-06-19 11:16:52 -04:00
Carsten Kragelund Jørgensen
eb3a57eb17 Ignore generation/endgeneration tags when analyzing Jinja chat template (#2787)
* ignore generation/endgeneration tags

Axolotl handles calculating the mask for assistant turns on its own, and as such these tags are not needed, however currently the analyzer does not recognize them at all and throws an error.

* feat: add phi4 tokenizer test and unblock gemma2

* fix: improve template

* chore: refactor

* chore: lint

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-18 15:59:07 -04:00
Wing Lian
34da391391 Set dev version (#2807) [skip ci] 2025-06-18 15:49:05 -04:00
NanoCode012
0bb9077553 Fix: logging on py310 (#2802)
* feat: encourage py311

* fix: logging import on py310

* fix: do upper and simplify handling
2025-06-18 15:46:27 -04:00
Wing Lian
a85efffbef bump transformers==4.52.4 (#2800) [skip ci]
* bump transformers==4.52.4

* don't use hf offline for qwen tokenizer

* increase timeout

* don't use methodtype

* increase timeout

* better assertion logging

* upgrade deepspeed version too
2025-06-18 15:46:14 -04:00
Dan Saunders
9d5bfc127e Config doc autogen (#2718)
* config reference doc autogen

* improvements

* cleanup; still ugly but working

* reformat

* remove autogen config ref from git

* factor out validations

* rewrite

* rewrite

* cleanup

* progress

* progress

* progress

* lint and minifying somewhat

* remove unneeded

* coderabbit

* coderabbit

* update preview-docs workflow triggers

* installing with deps

* coderabbit

* update refs

* overwrote file accidentally
2025-06-18 15:36:53 -04:00
Wing Lian
88c0e8d048 release tag (#2799)
Some checks failed
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.6.0) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
2025-06-17 12:13:27 -04:00
NanoCode012
d8e8cd8558 feat: remove evalfirst callback with built-in trainer arg (#2797) 2025-06-17 12:09:33 -04:00
Wing Lian
ccc94da8ad KD fix w/ online distillation (#2700) [skip ci]
* kd fixes

* fix collator setup

* fix input args

* better handling to drop string fields for kd with raw dataset

* kd trainer has kd temp as part of the init

* drop top_k before softmax

* simplfy and remove zscore

* WIP chunked KD loss with autograd wrapper

* more fixes and liger-type chunked loss

* collator cls for plugins

* remove debugging

* additional plugin collator kwargs, don't scale up kd loss by t^2

* don't need temp arg to distill method

* online kd wip

* add close to comment block

* suport sampling params/max new tokens

* handle when no custom collator is used in plugins

* logsumexp trick:

* fix check

* shift off the first empty token

* fix length of padding

* use max not min

* temp scale kd loss at end

* support for dynamic plugin training args mixins and symmetric kl

* chore: lint

* fix trainer callback base class

* Fix decay

* accept compressed responses for smaller wire payload

* post-rebase lint

* more KD updates

* increase hyperparams_count for gradients for added normalize_topk

* fix to remove attention_mask

* rename vars for consistency

* fix rebase issues

* default to dropping last batch in multipack batch sampler

* improve handling of train len

* init collator_cls_and_kwargs

* explicit drop_last=False when checking for multipack completeness

* use separate v2 loader for kd

* fix kd tests to use subprocess so it picks up kd training args

* default value for kd_beta arg

* use updated dataset for ci

* longer timeout for e2e
2025-06-17 12:09:13 -04:00
NanoCode012
21388cf615 Fix: lora kernel pre-patch applied despite post-patch not applied (#2772)
* fix: do not pre-patch self attention if lora dropout non-zero

* fix: add test to check patch not applied

* fix: test

* fix: test config check

* fix where we check so that tests don't break

* fix: test

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-14 11:54:06 -07:00
NanoCode012
80d5b066ec Fix: adding magistral fsdp config, fixing not eval with test_datasets, handle mllama attention (#2789) [skip ci]
* feat: add fsdp config for magistral

* fix: add mllama self attention handling for lora kernels

* fix: no eval if val_set_size 0 despite having test_datasets

* fix: add note for cce for vlm in newer model
2025-06-14 11:53:43 -07:00
Wing Lian
b2274d430b support for QAT w RL (DPO) (#2776) 2025-06-13 10:00:35 -04:00
NanoCode012
eac4a61f55 Feat: Add Magistral and mistral-common tokenizer support (#2780) 2025-06-12 19:18:33 -04:00
JZacaroli
f5fbc82f2b Fix logging import in evaluate.py (#2782) (#2783)
* Fix logging import in evaluate.py (#2782)

* chore: lint

---------

Co-authored-by: Joe Zacaroli <jaz@cyberscience.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-06-12 13:23:31 -04:00
Wing Lian
468580d18e limit multipack sampler processes (#2771) [skip ci]
* limit to 16 packing processes

* make num_processes properly reflect configured dataset_processes
2025-06-12 13:22:58 -04:00