Wing Lian
4ff96a2526
fix xformers version ( #2888 )
2025-07-09 08:43:40 -04:00
salman
89e99eaaa7
slowest durations ( #2887 ) [skip ci]
2025-07-09 08:43:26 -04:00
Wing Lian
6ed501f6dc
add 2.7.0 torch images back to support vlllm ( #2885 )
2025-07-08 16:28:14 -04:00
NanoCode012
8c6a6ea6eb
Feat: add devstral model support ( #2880 ) [skip ci]
...
* fix: do not add training and training_detail block by default
* fixed: magistral docs
* fix: address pad adding new fields and use built-in from_openai
* feat: try enable multiprocessing
* fix: check for keys before deleting attn_mask
* feat: add mistral pad test
* feat: add tool calling test
* feat: add devstral tokenizer tests
* fix: comma format
* chore: remove unused support_preprocessing as tokenizer is pickable now
* chore: update magistral doc
* feat: add devstral readme and example
* chore: refactor error handling
2025-07-08 11:01:19 -04:00
NanoCode012
78bff4925e
fix: set add_generation_prompt to False when apply chat template ( #2859 ) [skip ci]
2025-07-08 11:00:44 -04:00
NanoCode012
b237c8a3f3
chore: update cce commit to include gemma3n fixes ( #2881 ) [skip ci]
2025-07-08 10:59:35 -04:00
float-trip
1032e22650
Fix link in FSDP + QLoRA docs. ( #2879 ) [skip ci]
2025-07-08 09:19:09 -04:00
Wing Lian
d68cc1e8ab
densemixer plugin integration ( #2868 )
...
* densemixer plugin integration
* update readme with usage docs
* automatically find new integrations that aren't explicitly defined
* make sure to import os
2025-07-07 17:05:19 -04:00
github-actions[bot]
21f1bf4805
chore: update pre-commit hooks ( #2870 ) [skip ci]
...
* chore: update pre-commit hooks
* don't bandit huggingface hub downloads without revision
---------
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-07 15:26:15 -04:00
Wing Lian
de2c5ba103
mark flaky geglu tests and add torch seed ( #2876 ) [skip ci]
...
* mark flaky geglu tests and add torch seed
* restore accidental removal of seed
2025-07-07 15:24:16 -04:00
Wing Lian
9c0d7ee761
TiledMLP support ( #2865 )
2025-07-07 15:23:49 -04:00
NanoCode012
22d4a838dc
feat(doc): add vllm and fa2 incompat error to faq ( #2877 )
2025-07-07 14:13:37 -04:00
Wing Lian
a108e5db56
use latest version of cce fork for SP fix ( #2871 ) [skip ci]
...
* use latest version of cce fork for SP fix
* latest sha to handle older transformers
2025-07-07 13:05:11 -04:00
Wing Lian
faff0cff41
manage jinja templates as nicely formatted files ( #2795 )
...
* manage jinja templates as nicely formatted files
* chore: lint
* use path for templates relative to the module
* fix template reformating
* handle newlines in llama3 template
* fix gemma3 jinja
* fix templates
* suport for passing jinja template file in yaml
* handle file loading of jinja template outside of validation
* fix typing and typo
2025-07-07 10:11:48 -04:00
Wing Lian
759cefb741
setup defaults for dataloader to ensure GPU is kept busy ( #2632 ) [skip ci]
2025-07-07 10:10:58 -04:00
Wing Lian
69cd49a7aa
update transformers to 4.53.1 ( #2844 ) [skip ci]
...
* update transformers to 4.53.0
* remove attention_mask from signature columns if using packing
* remove attention_mask column from dataloader
* update signature of flash attn forward for ring attn patch
* fix FSDP
* patch ring-flash-attn with upstream signature fix
* fix patch indentation level
* fix the patch
* add batch flattening smoke test with loss check that works in older transformers
* fix patch
* don't drop attention mask for flex
* more fixes
* patch create_causal_mask for packing w flex
* global torch manual_seed fixture
* tweak loss checks
* fix patch and use single batch for flex
* don't need to reload
* fix causal mask patch
* use transformers patch releasE
* make sure env var is string
* make sure to drop attention mask for flex w packing for latest transformers patch release
* tweak loss
* guard on signature columns before removing attention mask
* bump loss
* set remove isn't chainable
* skip slow mistral test in 2.5.1
2025-07-07 09:35:22 -04:00
NanoCode012
5a961ecadf
Fix: do not call preprocess in multimodal or pretraining case ( #2861 )
...
* fix: let users know to not call preprocess for vision mode
* fix: improve ux for pretraining dataset and skip prepare ds
* feat: add info to doc
* Update src/axolotl/cli/preprocess.py following comment
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-07-06 21:55:33 -04:00
Wing Lian
b37ddf9778
don't use tokenizer parallelism when using packing ( #2862 ) [skip ci]
2025-07-06 21:55:09 -04:00
Wing Lian
bf38e507fb
respect shuffle_merged_datasets for single dataset too ( #2866 ) [skip ci]
...
* respect shuffle_merged_datasets for single dataset too
* update inline comment for behavior
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-07-06 21:20:41 -04:00
Wing Lian
a5946ff1f0
build fa2 from source for base image with torch2.6 and cu124 ( #2867 )
2025-07-05 09:21:18 -04:00
Wing Lian
70ca1b2291
fix nightlies to use correct cache ( #2848 ) [skip ci]
...
* fix nightlies to use correct cache
* fix for handling None for bf16
2025-07-03 12:21:39 -04:00
NanoCode012
8ae5a2311b
feat: update handling for mistraltokenizer decode and multiprocessing pickling fix ( #2790 )
...
* feat: update handling for mistraltokenizer decode
* fix: update mistral common package version
* fix: to use correct release
* fix triton path
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-02 08:07:18 -04:00
NanoCode012
6383630155
Fix: tokenize stall due to not shuffling dataset ( #2845 )
...
* fix: shuffle dataset even if only one to fix tokenize stall
* fix: warn if shuffling merged with curriculum sampling
* chore: refactor
2025-07-02 08:06:00 -04:00
Vincenzo di Cicco
f2b352f2e5
Add sample_packing_sequentially to trainer args ( #2853 ) [skip ci]
2025-07-02 08:05:35 -04:00
NanoCode012
bf5928d0ee
feat(doc): update docker tag examples ( #2851 ) [skip ci]
...
* feat(doc): update docker tag examples
* chore: comment
2025-07-02 08:05:01 -04:00
Dhruv Mullick
d1224db8f4
Decouple generate_during_eval from wandb to support other visualizers ( #2849 ) [skip ci]
...
* Add generate_during_eval for mlflow for dpo
* Decouple generate_during_eval from wandb
2025-07-02 08:04:40 -04:00
mhenrichsen
327b4e48e9
Add installation instructions for pip and Docker to README.md ( #2854 )
...
* Add installation instructions for pip and Docker to README.md
* Enhance README.md with Docker installation guidance for improved setup reliability.
2025-07-02 09:03:52 +02:00
Dan Saunders
35fdbce102
Ensure device mesh patching is applied ( #2842 )
...
* move patches; make patch stronger
* fix broken tests
* guard sequence_parallel_degree comparison against none
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-06-29 22:16:32 -04:00
Wing Lian
cb811f8bf1
upgrade to flash-attn 2.8.0.post2 ( #2828 )
...
* upgrade to flash-attn 2.8.0.post2
* use cu126 with torch 2.6
* seems vllm 0.8.5.post1 not compatible with cuda12.6.3 and torch 2.6
* cu126 + torch 2.6 as the default
* use cu126 for multigpu w torch 2.6 too
* drop vllm for now from ci for now
2025-06-29 22:11:16 -04:00
Wing Lian
7563e1bd30
set a different triton cache for each test to avoid blocking writes to cache ( #2843 )
...
* set a different triton cache for each test to avoid blocking writes to cache
* set log level
* disable debug logging for filelock
2025-06-29 22:05:21 -04:00
Wing Lian
81893c775c
Accelerate 1.8.1 and BNB 0.46.0 update ( #2815 )
...
* update accelerate to v1.8.0
* update bnb also
* fix multigpu ci timeout
* fix test set size
* use latest accelerate 1.8.1
* disable default dtype
2025-06-28 15:29:19 -04:00
Wing Lian
a1a740608d
add assertion for packing patch to _get_unpad_data ( #2840 )
2025-06-27 11:20:23 -04:00
kallewoof
ec15a7a691
Support --lora-on-cpu flag for DPO model merging ( #2766 ) [skip ci]
...
* Support --lora-on-cpu flag for DPO model merging
* fix: use device=cpu in _convert_embedding_modules_dtype when lora_on_cpu is set
2025-06-27 11:19:24 -04:00
Wing Lian
0a7a216b60
allow for different sequence_len for evaluations ( #2836 ) [skip ci]
...
* allow for different sequence_len for evaluations
* reversed 🤦
* add more information to filter msg
2025-06-27 11:02:51 -04:00
NanoCode012
d8280d45c1
feat: add chat_template kwargs ( #2837 )
2025-06-27 10:38:46 -04:00
Wing Lian
24f2887e87
don't fail during preprocess for sampling from iterable dataset ( #2825 ) [skip ci]
2025-06-27 10:37:53 -04:00
NanoCode012
29289a4de9
feat: replace old colab notebook with newer one ( #2838 ) [skip ci]
...
* feat: replace old colab notebook with newer one
* fix: point to update cce fork
2025-06-27 10:35:47 -04:00
Wing Lian
a24957fa04
fix for iterable datasets and pickling ( #2831 ) [skip ci]
...
* fix for iterable datasets and pickling
* more fixes for pretraining
* can't pickle mock generator dataset
2025-06-27 10:35:23 -04:00
NanoCode012
927bf530bc
fix(doc): default messages example used wrong key ( #2832 )
...
* fix(doc): default messages example used wrong key
* feat: add links to SP, multi-gpu, multi-node on readme
2025-06-26 10:47:31 -04:00
github-actions[bot]
18954ba100
chore: update pre-commit hooks ( #2821 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-06-26 10:46:53 -04:00
Wing Lian
d8cf66edbd
use fork for multiprocess start method for packing in parallel ( #2830 )
2025-06-25 13:17:33 -04:00
NanoCode012
181cc3106b
fix: catch httperror from ratelimiting hf when checking user token ( #2827 )
2025-06-25 09:50:13 -04:00
NanoCode012
20106116da
fix: 'NoneType' object has no attribute 'column_names' ( #2822 ) [skip ci]
...
* fix: 'NoneType' object has no attribute 'column_names'
* chore: typing
2025-06-25 09:49:55 -04:00
Younes B
a27c4f8771
feat: add falcon-h1 into axolotl ( #2811 ) [skip ci]
...
* feat: add falcon-h1 into axolotl
* fix pre-commit
* review
* fix: remove packing
2025-06-25 09:49:42 -04:00
NanoCode012
bb1109b81d
feat: update CCE to use axolotl's fork ( #2813 ) [skip ci]
...
* feat: update CCE to use axolotl's fork
* chore: improve error message
* feat: add eot token for gemma3 configs
* fix: only warn on more than 1 image
* fix: re-add gemma3 patch
* Revert "fix: re-add gemma3 patch"
This reverts commit f04db5e873 .
* feat: add qwen25 vl example
* feat: point to upstream fork cce package
* feat: update cce commit
2025-06-25 09:49:22 -04:00
Dan Saunders
8c69ec3a1e
gating _gather_outputs (causes increased vram usage) ( #2829 )
...
* SP vram fix
* gating _gather_outputs (causes increased vram usage)
* reverting unneeded change
2025-06-25 08:33:55 -04:00
Dan Saunders
46675496a3
log config ( #2819 )
...
* log config
* moving text art; adding sensitive value redaction + sorting
* revert pre-commit changes
* remove none-valued config before dumping
* just redact api keys
2025-06-24 14:59:30 -04:00
NanoCode012
c6b5d35e5d
fix: re-add gemma3 patch ( #2817 )
2025-06-24 10:51:30 +07:00
Wing Lian
12c826816d
chunked cross entropy loss ( #2625 )
...
* chunked cross entropy loss
* refactor so we can add test
* use relative import
* update schema description
2025-06-23 23:08:46 -04:00
Dan Saunders
1d8f500709
deepspeed fix ( #2820 )
2025-06-23 09:07:57 -04:00