axolotl

Author	SHA1	Message	Date
Wing Lian	aa684122f1	upgrade peft==0.16.0 and datasets==4.0.0 (#2917 ) [skip ci] * upgrade peft to 0.16.0 * upgrade datasets to 4.0.0 * refactor dupes from merge/rebase * fix check for fsdp1 + sharded_state_dict * use full state dict for ci	2025-07-14 20:09:26 -04:00
Wing Lian	ca4d4ef793	don't init distributed for deepspeed if preprocessing (#2920 ) * don't init distributed for deepspeed if preprocessing * add e2e test to validate preprocess cli with deepspeed * ignore duplicate code for cfg	2025-07-14 14:19:19 -04:00
Dan Saunders	37edbe4999	Remove extra torch.compile call (#2904 ) * debug * debug * debug * moving validation code to transformers * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint	2025-07-14 12:32:45 -04:00
Wing Lian	af92151a7b	FSDP2 fix validation and add tests (#2910 ) * fix validation and add tests * remove debugging and add more tests * remove migrate_fsdp	2025-07-14 09:25:44 -04:00
Wing Lian	5081db7f8a	upgrade trl==0.19.1 (#2892 ) [skip ci] * upgrade trl==0.19.1 * add vllm for tests for grpo * fixes to work with latest trl * need data_parallel_size config too * support for vllm_mode for server / colocate * vllm settings for colocate * relax vllm version * bump min hf hub for latest vllm support * add hints on string literal for vllm mode * use latest transformers 4.53.2 * tweak acceptable loss on flaky test_ds_zero3_packed test * don't run flaky vllm/grpo tests for now	2025-07-14 09:23:42 -04:00
Wing Lian	41664c7c4c	fix ddp for incorrect steps (#2915 ) * fix ddp for incorrect steps * add test	2025-07-14 07:51:16 -04:00
Jiawei Liu	7fb8441e0e	fix: customized dataset with simpo (#2894 ) [skip ci]	2025-07-12 11:40:30 -04:00
salman	d6e4a611e5	FSDP1 -> FSDP2 (#2760 ) * FSDP2 args migration implementation This commit implements the migration to FSDP2 arguments including: - FSDP2 support with LoRA training - DPO integration with FSDP2 - Model loading fixes and refactoring - CPU offloading and PEFT handling - Test updates and CI improvements - Bug fixes for dtype errors and various edge cases	2025-07-12 15:18:01 +01:00
Ed Sealing	eb662557a7	Register Plugins in Ray Workers (#2901 ) [skip ci] * Access plugins in ray cluster * Add comment * chore: lint --------- Co-authored-by: Ed Sealing <ed.sealing@patapsco.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-11 16:59:59 -04:00
Wing Lian	c370d0795c	[doc] Fix docs for text field mapping for completion datasets (#2890 ) * Fix docs for text field mapping for completion datasets * update another reference	2025-07-09 14:52:44 -04:00
Wing Lian	76aeb16156	tiled_mlp supports single gpu (#2891 ) * tiled_mlp supports single gpu * use checkpoint offloading for arctic training * patch torch checkpoint too * support for single gpu zero3 * add linkback to where it was copied from	2025-07-09 12:48:22 -04:00
Wing Lian	7c5ea0010f	bump dev version (#2889 ) [skip ci]	2025-07-09 09:43:42 -04:00
Wing Lian	c6d69d5c1b	release v0.11.0 (#2875 ) Some checks failed ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * release v0.11.0 * don't build vllm into release for now * remove 2.5.1 references * smollm3 multipack support * fix ordering of e2e tests	2025-07-09 09:22:35 -04:00
NanoCode012	8c6a6ea6eb	Feat: add devstral model support (#2880 ) [skip ci] * fix: do not add training and training_detail block by default * fixed: magistral docs * fix: address pad adding new fields and use built-in from_openai * feat: try enable multiprocessing * fix: check for keys before deleting attn_mask * feat: add mistral pad test * feat: add tool calling test * feat: add devstral tokenizer tests * fix: comma format * chore: remove unused support_preprocessing as tokenizer is pickable now * chore: update magistral doc * feat: add devstral readme and example * chore: refactor error handling	2025-07-08 11:01:19 -04:00
NanoCode012	78bff4925e	fix: set add_generation_prompt to False when apply chat template (#2859 ) [skip ci]	2025-07-08 11:00:44 -04:00
NanoCode012	b237c8a3f3	chore: update cce commit to include gemma3n fixes (#2881 ) [skip ci]	2025-07-08 10:59:35 -04:00
Wing Lian	d68cc1e8ab	densemixer plugin integration (#2868 ) * densemixer plugin integration * update readme with usage docs * automatically find new integrations that aren't explicitly defined * make sure to import os	2025-07-07 17:05:19 -04:00
Wing Lian	9c0d7ee761	TiledMLP support (#2865 )	2025-07-07 15:23:49 -04:00
Wing Lian	a108e5db56	use latest version of cce fork for SP fix (#2871 ) [skip ci] * use latest version of cce fork for SP fix * latest sha to handle older transformers	2025-07-07 13:05:11 -04:00
Wing Lian	faff0cff41	manage jinja templates as nicely formatted files (#2795 ) * manage jinja templates as nicely formatted files * chore: lint * use path for templates relative to the module * fix template reformating * handle newlines in llama3 template * fix gemma3 jinja * fix templates * suport for passing jinja template file in yaml * handle file loading of jinja template outside of validation * fix typing and typo	2025-07-07 10:11:48 -04:00
Wing Lian	759cefb741	setup defaults for dataloader to ensure GPU is kept busy (#2632 ) [skip ci]	2025-07-07 10:10:58 -04:00
Wing Lian	69cd49a7aa	update transformers to 4.53.1 (#2844 ) [skip ci] * update transformers to 4.53.0 * remove attention_mask from signature columns if using packing * remove attention_mask column from dataloader * update signature of flash attn forward for ring attn patch * fix FSDP * patch ring-flash-attn with upstream signature fix * fix patch indentation level * fix the patch * add batch flattening smoke test with loss check that works in older transformers * fix patch * don't drop attention mask for flex * more fixes * patch create_causal_mask for packing w flex * global torch manual_seed fixture * tweak loss checks * fix patch and use single batch for flex * don't need to reload * fix causal mask patch * use transformers patch releasE * make sure env var is string * make sure to drop attention mask for flex w packing for latest transformers patch release * tweak loss * guard on signature columns before removing attention mask * bump loss * set remove isn't chainable * skip slow mistral test in 2.5.1	2025-07-07 09:35:22 -04:00
NanoCode012	5a961ecadf	Fix: do not call preprocess in multimodal or pretraining case (#2861 ) * fix: let users know to not call preprocess for vision mode * fix: improve ux for pretraining dataset and skip prepare ds * feat: add info to doc * Update src/axolotl/cli/preprocess.py following comment Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-07-06 21:55:33 -04:00
Wing Lian	b37ddf9778	don't use tokenizer parallelism when using packing (#2862 ) [skip ci]	2025-07-06 21:55:09 -04:00
Wing Lian	bf38e507fb	respect shuffle_merged_datasets for single dataset too (#2866 ) [skip ci] * respect shuffle_merged_datasets for single dataset too * update inline comment for behavior Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-06 21:20:41 -04:00
Wing Lian	70ca1b2291	fix nightlies to use correct cache (#2848 ) [skip ci] * fix nightlies to use correct cache * fix for handling None for bf16	2025-07-03 12:21:39 -04:00
NanoCode012	8ae5a2311b	feat: update handling for mistraltokenizer decode and multiprocessing pickling fix (#2790 ) * feat: update handling for mistraltokenizer decode * fix: update mistral common package version * fix: to use correct release * fix triton path --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-02 08:07:18 -04:00
NanoCode012	6383630155	Fix: tokenize stall due to not shuffling dataset (#2845 ) * fix: shuffle dataset even if only one to fix tokenize stall * fix: warn if shuffling merged with curriculum sampling * chore: refactor	2025-07-02 08:06:00 -04:00
Vincenzo di Cicco	f2b352f2e5	Add sample_packing_sequentially to trainer args (#2853 ) [skip ci]	2025-07-02 08:05:35 -04:00
Dhruv Mullick	d1224db8f4	Decouple generate_during_eval from wandb to support other visualizers (#2849 ) [skip ci] * Add generate_during_eval for mlflow for dpo * Decouple generate_during_eval from wandb	2025-07-02 08:04:40 -04:00
Dan Saunders	35fdbce102	Ensure device mesh patching is applied (#2842 ) * move patches; make patch stronger * fix broken tests * guard sequence_parallel_degree comparison against none --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-29 22:16:32 -04:00
Wing Lian	81893c775c	Accelerate 1.8.1 and BNB 0.46.0 update (#2815 ) * update accelerate to v1.8.0 * update bnb also * fix multigpu ci timeout * fix test set size * use latest accelerate 1.8.1 * disable default dtype	2025-06-28 15:29:19 -04:00
Wing Lian	a1a740608d	add assertion for packing patch to _get_unpad_data (#2840 )	2025-06-27 11:20:23 -04:00
kallewoof	ec15a7a691	Support --lora-on-cpu flag for DPO model merging (#2766 ) [skip ci] * Support --lora-on-cpu flag for DPO model merging * fix: use device=cpu in _convert_embedding_modules_dtype when lora_on_cpu is set	2025-06-27 11:19:24 -04:00
Wing Lian	0a7a216b60	allow for different sequence_len for evaluations (#2836 ) [skip ci] * allow for different sequence_len for evaluations * reversed 🤦 * add more information to filter msg	2025-06-27 11:02:51 -04:00
NanoCode012	d8280d45c1	feat: add chat_template kwargs (#2837 )	2025-06-27 10:38:46 -04:00
Wing Lian	24f2887e87	don't fail during preprocess for sampling from iterable dataset (#2825 ) [skip ci]	2025-06-27 10:37:53 -04:00
Wing Lian	a24957fa04	fix for iterable datasets and pickling (#2831 ) [skip ci] * fix for iterable datasets and pickling * more fixes for pretraining * can't pickle mock generator dataset	2025-06-27 10:35:23 -04:00
Wing Lian	d8cf66edbd	use fork for multiprocess start method for packing in parallel (#2830 )	2025-06-25 13:17:33 -04:00
NanoCode012	181cc3106b	fix: catch httperror from ratelimiting hf when checking user token (#2827 )	2025-06-25 09:50:13 -04:00
NanoCode012	20106116da	fix: 'NoneType' object has no attribute 'column_names' (#2822 ) [skip ci] * fix: 'NoneType' object has no attribute 'column_names' * chore: typing	2025-06-25 09:49:55 -04:00
Younes B	a27c4f8771	feat: add falcon-h1 into axolotl (#2811 ) [skip ci] * feat: add falcon-h1 into axolotl * fix pre-commit * review * fix: remove packing	2025-06-25 09:49:42 -04:00
NanoCode012	bb1109b81d	feat: update CCE to use axolotl's fork (#2813 ) [skip ci] * feat: update CCE to use axolotl's fork * chore: improve error message * feat: add eot token for gemma3 configs * fix: only warn on more than 1 image * fix: re-add gemma3 patch * Revert "fix: re-add gemma3 patch" This reverts commit `f04db5e873`. * feat: add qwen25 vl example * feat: point to upstream fork cce package * feat: update cce commit	2025-06-25 09:49:22 -04:00
Dan Saunders	8c69ec3a1e	gating _gather_outputs (causes increased vram usage) (#2829 ) * SP vram fix * gating _gather_outputs (causes increased vram usage) * reverting unneeded change	2025-06-25 08:33:55 -04:00
Dan Saunders	46675496a3	log config (#2819 ) * log config * moving text art; adding sensitive value redaction + sorting * revert pre-commit changes * remove none-valued config before dumping * just redact api keys	2025-06-24 14:59:30 -04:00
NanoCode012	c6b5d35e5d	fix: re-add gemma3 patch (#2817 )	2025-06-24 10:51:30 +07:00
Wing Lian	12c826816d	chunked cross entropy loss (#2625 ) * chunked cross entropy loss * refactor so we can add test * use relative import * update schema description	2025-06-23 23:08:46 -04:00
Dan Saunders	1d8f500709	deepspeed fix (#2820 )	2025-06-23 09:07:57 -04:00
Dan Saunders	45adf1bfb9	get_logger use_environ fix (#2808 ) * get_logger use_environ fix * rethinking * replacing old logger imports * simplify * fix boolean cond	2025-06-19 11:16:52 -04:00
Carsten Kragelund Jørgensen	eb3a57eb17	Ignore generation/endgeneration tags when analyzing Jinja chat template (#2787 ) * ignore generation/endgeneration tags Axolotl handles calculating the mask for assistant turns on its own, and as such these tags are not needed, however currently the analyzer does not recognize them at all and throws an error. * feat: add phi4 tokenizer test and unblock gemma2 * fix: improve template * chore: refactor * chore: lint --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-18 15:59:07 -04:00

... 4 5 6 7 8 ...

1460 Commits