axolotl

Author	SHA1	Message	Date
Dan Saunders	68bb70bbae	fix test	2025-08-22 16:02:30 +00:00
Dan Saunders	5d8d7ef327	lint	2025-08-22 16:02:30 +00:00
Dan Saunders	7836da9ed9	remove unuse	2025-08-22 16:02:30 +00:00
Dan Saunders	7eba3795fe	fixes	2025-08-22 16:02:30 +00:00
Dan Saunders	1b7b67d06e	smoke test	2025-08-22 16:02:30 +00:00
Dan Saunders	0843dc678a	separate out train and eval datasets streaming; cleanup	2025-08-22 16:02:30 +00:00
Dan Saunders	aa5a497a2c	nits	2025-08-22 16:02:30 +00:00
Dan Saunders	7bb52d00bb	progress on streaming	2025-08-22 16:02:30 +00:00
VED	0eef385b1a	[feat] truncation support with excess_length_strategy (#3068 ) [skip ci] * feat:truncation support with excess_len * pre-commit * excess_length_strategy * requested changes * lint * added handle_long_seq_in_dataset in sft * comments improved	2025-08-18 08:39:13 -04:00
Wing Lian	ecbe8b2b61	[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 ) * improve fsdp shard merging * improve logging * update information on merging and inferencing GPT-OSS * cleanup readme * automate cleanup of FSDP prefix * import GRPO only if necessary * only modify config.json on rank0 * merge final checkpoint at end of training * prevent circular import * Fix saving for sharded state dict * devx, move merged to output dir * move import back to top * Fix stuck merge * fix conditionals from pr feedback and add test	2025-08-15 21:25:01 -04:00
Wing Lian	130ef7c51a	Various fixes for VLMs (#3063 ) * fix to not use batch feature indexing * more vlm fixes * use AutoModelForImageTextToText * add example yaml and need num2words for chat template * improve handling of adding image tokens to conversation * add lfm2-vl support * update the lfm readme * fix markdown and add rtol for loss checks * feat: add smolvlm2 processing strat * fix: check for causal-conv1d in lfm models * feat: add docs for lfm2 * feat: add new models and tips to docs * feat: add smolvlm2 docs and remove extra dep * chore: update docs * feat: add video instructions * chore: cleanup * chore: comments * fix: typo * feat: add usage stats * chore: refactor --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-15 10:52:57 -04:00
Wing Lian	09145de8fa	upgrade transformers==4.55.1 and bitsandbytes==0.47.0 (#3064 ) * upgrade transformers==4.55.1 * also upgrade bnb * remove bnb params4bit patch (upstreamed) * use latest causal-conv1d * fix patching ring-flash-attn with now missing imports --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-08-13 19:41:07 -04:00
Wing Lian	d4d84d48af	fix ray train and add fsdp2 smoke test for ray trainer (#3053 ) * add fsdp2 smokle test for ray trainer * fix raytrain with fsdp2	2025-08-11 09:31:54 -04:00
Wing Lian	9b12c05660	use exec instead of subprocess to make ctrl+c nicer for cli (#3044 ) * use exec instead of subprocess to make ctrl+c nicer for cli * change var name to use_exec * simplify to bool * flush std* * patch subprocess as mock in test * fix tests * more test fixes	2025-08-10 20:22:20 -04:00
Wing Lian	d6b81b3683	update training args check for new defaults (#3051 ) [skip ci] * update training args check for new defaults * skip check for now	2025-08-10 11:26:22 -04:00
Dan Saunders	0ae06d756d	use nanmean for loss aggregation (CP fix) (#3033 ) * use nanmena for loss aggregation (CP fix) * use regular asserts * small changes to make tests isolate * combining evaluation_loop patches * fix * delete unused * fix check	2025-08-08 08:15:17 -04:00
Wing Lian	9d5c95db6f	Add support for Accelerate CP, ND examples, and fix for parallel config w fsdp (#3019 ) * fix for parallelism config from trainer * fix handling of parallelism_config w accelerate * add todo for removal * update to latest axolotl-contribs-mit for optimizer fix too * synchronize training after checkpoint save * dir spelling * use latest accelerate main * fix to not use partial state parallelism_config * more fixeS * use most recent accelerate fix * fix cpu_ram_efficient_loading to meta devices from rank 0 to prevent CPU RAM oom * improve handling of broadcasting fsdp2 state dict * support for openai chat template with thinking key as the reasoning trace * address PR feedback * refactor to remove dependency on PartialState for parallelism config * bump accelerate, gptoss fixes * limit meta fixes to fsdp2 for now * fixes for gpt oss * fixup examples, don't use cpu-ram-efficient-loading for now * remove problematic barrier * patch parallelism config * reorder comparison * device mesh fixes * make pure CP work * lint	2025-08-07 21:22:15 -04:00
Wing Lian	4bce713b39	allow custom trainer_cls to be defined as a module reference in the YAML (#3024 ) [skip ci] * allow custom trainer_cls to be defined as a module reference in the YAML * address PR feedback and add test * add tests	2025-08-06 22:49:19 -04:00
Dan Saunders	d09290f2f4	Lora kernels bias support (#3025 ) * lora kernels bias support * revert rename * nit * lint, tests * satisfying the rabbit	2025-08-06 20:20:08 -04:00
Wing Lian	97e86c6d47	drop old patches and code that are no longer needed (#3007 ) [skip ci]	2025-08-06 08:02:39 -04:00
Wing Lian	ab49d16e34	Dion optimizer support (#3014 ) * Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit	2025-08-04 16:33:30 -04:00
Dan Saunders	e758343cac	FSDP2 + LoRA kernels (#2992 ) * impl fix * smoke tests * patches for fsdp2 + qlora compat * nit * working fix * working fix * fix merge * minifying patches; update bnb dep * renaming; adding tests * remove duplicate test, add dora guard * generalize __torch_function__ * revert generalization * update comments	2025-08-03 20:05:17 -04:00
salman	294c7fe7a6	Distributed/ND-Parallel (#2977 )	2025-07-31 15:25:02 -04:00
Wing Lian	7b68dfafd7	jagged lr restart scheudler (#1680 ) [skip ci] * jagged lr restart scheudler var name fix make sure to create scheduler first * wire things together * more fixes * fix for nesting scheduler and first anneal phase * no need for relora trainer anymore since we've generalized the relora scheduler * remove redundant relora scheduler and lint * update relora e2e test for updated params * need restart steps for relora test * update quarto docs for dropped relora trainer * update example yaml * drop verbose arg * min lr scale support for jagged lr * don't let min_lr be nonetype * cleanup args	2025-07-31 13:50:03 -04:00
Wing Lian	563f5eed7a	update dependencies - liger + trl (#2987 ) * update dependencies * set dataset processes for tests * add support for GSPO	2025-07-31 11:17:17 -04:00
Dan Saunders	bb1cae1a20	CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 ) * add --launcher option; explicit True/False bool args; small cleanup * refactor * add torchrun, accelerate cli args * add rdzv arg default + tests * update _quarto * coderabbit * fix * we can't set rdvz_id independently across nodes * coderabbit * fix tests	2025-07-30 15:46:56 -04:00
NanoCode012	90e5598930	Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes (#2979 ) * fix: lock version in gemma3n docs * feat: add sample configs and docs * chore: move mistraltokenizer into mistral folder * feat: update instructions * feat: add dynamic load voxtral * fix: remove incorrect vision config, add audio * fix: support voxtral processing strategy and address none in data * feat: patch mistraltokenizer subclass upstream and add missing * feat: update cce commit to include voxtral * fix: remove old comment * fix: gemma3 patch not needed anymore * fix: voxtral modeling code * fix: remove incorrect ds path * fix: adjust apply chat template parsing * feat: enable voxtral patch * fix: patch * feat: update example datasets * fix: target layer * feat: update gemma3n docs * feat: update voxtral docs * feat: revert assistant parsing to rely on new upstream changes * chore: skip test till next PR fix * fix: override upstream decode due to missing handling * feat: update readme * fix: update * feat: add magistral small think support * feat: update mistral-common dep * fix: lint * fix: remove optional dep * chore: typing * chore: simply import * feat(doc): update differences for 2507 * fix: coderrabbit comments * feat: update clarify docs on new transformers	2025-07-30 15:57:05 +07:00
Wing Lian	0ff2f172ef	Act offload lora fix (#2928 ) [skip ci] * fix activation offloading with lora * update w e2e test * add docs for error	2025-07-24 16:10:04 -04:00
Dan Saunders	208fb7b8e7	basic torchao fp8 mixed precision training (#2926 ) * debug * debug * debug * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint * accelerator constructor patch for single-GPU torch fp8 * lint * re-using existing fp8 code * lint * remove accelerate patch now fix in latest release * fix * docs * add fp8 + fsdp2 example * remove unused config * update config * smoke tests * add validator * add 2.7.0 guard for fsdp2 * fix * add config descriptions * add FSDP doc link * nit * set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather * better cfg for smoke tests * add test for accelerate patching * update fp8 validator	2025-07-22 16:27:47 -04:00
Wing Lian	af8d257aa2	make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci] * make pad_to_sequence_len default to the same value as sample_packing * remove duplicate validation * fix test * update description meta Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-21 11:40:56 -04:00
Wing Lian	db5f6f4693	limit num_proc when saving datasets to disk (#2948 ) [skip ci] * limit num_proc when saving datasets to disk * enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save * update fixtures with dataset processes since that should never be NoneType * improve reusability for tests	2025-07-21 11:39:38 -04:00
Wing Lian	36cbe13d18	activation offloading with cuda streams doesn't work with LoRA (#2927 )	2025-07-16 11:59:20 -04:00
Dan Saunders	10ba1622f7	checkpoint model on first step callback (#2906 ) * checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo	2025-07-15 15:00:48 -04:00
NanoCode012	354eaaf0d3	feat: add call method to mistral tokenizer wrapper (#2898 )	2025-07-14 22:33:35 -04:00
greenhestu	a061446540	Fix: Prevents merging of tool arguments during preprocessing (#2909 )	2025-07-14 22:33:10 -04:00
Wing Lian	cd079b5536	Tensor parallel w DeepSpeed AutoTP (#2574 ) * support for deepspeed autotup * bump to latest deepspeed that supports deepcompile too * add deepcompile support too * fix total steps calculation for TP * setup fixture for tp * update ds config to ensure weights are gathered for checkpoint * fix duplicate validation names * chore: lint	2025-07-14 21:33:48 -04:00
Wing Lian	38359a8997	allow profiling in mid-training rather from the start (#2899 ) [skip ci] * allow profiling in mid-training rather from the start * simplify based on PR feedback * fix logic, improve saving at end, add tests	2025-07-14 20:11:11 -04:00
Wing Lian	aa684122f1	upgrade peft==0.16.0 and datasets==4.0.0 (#2917 ) [skip ci] * upgrade peft to 0.16.0 * upgrade datasets to 4.0.0 * refactor dupes from merge/rebase * fix check for fsdp1 + sharded_state_dict * use full state dict for ci	2025-07-14 20:09:26 -04:00
Wing Lian	ca4d4ef793	don't init distributed for deepspeed if preprocessing (#2920 ) * don't init distributed for deepspeed if preprocessing * add e2e test to validate preprocess cli with deepspeed * ignore duplicate code for cfg	2025-07-14 14:19:19 -04:00
Wing Lian	e581c15d40	refactor dupes from merge/rebase (#2919 ) [skip ci]	2025-07-14 10:05:26 -04:00
Wing Lian	af92151a7b	FSDP2 fix validation and add tests (#2910 ) * fix validation and add tests * remove debugging and add more tests * remove migrate_fsdp	2025-07-14 09:25:44 -04:00
Wing Lian	5081db7f8a	upgrade trl==0.19.1 (#2892 ) [skip ci] * upgrade trl==0.19.1 * add vllm for tests for grpo * fixes to work with latest trl * need data_parallel_size config too * support for vllm_mode for server / colocate * vllm settings for colocate * relax vllm version * bump min hf hub for latest vllm support * add hints on string literal for vllm mode * use latest transformers 4.53.2 * tweak acceptable loss on flaky test_ds_zero3_packed test * don't run flaky vllm/grpo tests for now	2025-07-14 09:23:42 -04:00
Wing Lian	41664c7c4c	fix ddp for incorrect steps (#2915 ) * fix ddp for incorrect steps * add test	2025-07-14 07:51:16 -04:00
salman	d6e4a611e5	FSDP1 -> FSDP2 (#2760 ) * FSDP2 args migration implementation This commit implements the migration to FSDP2 arguments including: - FSDP2 support with LoRA training - DPO integration with FSDP2 - Model loading fixes and refactoring - CPU offloading and PEFT handling - Test updates and CI improvements - Bug fixes for dtype errors and various edge cases	2025-07-12 15:18:01 +01:00
NanoCode012	9b95a625ab	feat: add devstral small 2507 (#2896 ) * feat: add devstral small 2507 * chore: update blog doc	2025-07-11 09:34:19 +07:00
Wing Lian	c6d69d5c1b	release v0.11.0 (#2875 ) Some checks failed ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * release v0.11.0 * don't build vllm into release for now * remove 2.5.1 references * smollm3 multipack support * fix ordering of e2e tests	2025-07-09 09:22:35 -04:00
NanoCode012	8c6a6ea6eb	Feat: add devstral model support (#2880 ) [skip ci] * fix: do not add training and training_detail block by default * fixed: magistral docs * fix: address pad adding new fields and use built-in from_openai * feat: try enable multiprocessing * fix: check for keys before deleting attn_mask * feat: add mistral pad test * feat: add tool calling test * feat: add devstral tokenizer tests * fix: comma format * chore: remove unused support_preprocessing as tokenizer is pickable now * chore: update magistral doc * feat: add devstral readme and example * chore: refactor error handling	2025-07-08 11:01:19 -04:00
Wing Lian	de2c5ba103	mark flaky geglu tests and add torch seed (#2876 ) [skip ci] * mark flaky geglu tests and add torch seed * restore accidental removal of seed	2025-07-07 15:24:16 -04:00
Wing Lian	759cefb741	setup defaults for dataloader to ensure GPU is kept busy (#2632 ) [skip ci]	2025-07-07 10:10:58 -04:00
Wing Lian	69cd49a7aa	update transformers to 4.53.1 (#2844 ) [skip ci] * update transformers to 4.53.0 * remove attention_mask from signature columns if using packing * remove attention_mask column from dataloader * update signature of flash attn forward for ring attn patch * fix FSDP * patch ring-flash-attn with upstream signature fix * fix patch indentation level * fix the patch * add batch flattening smoke test with loss check that works in older transformers * fix patch * don't drop attention mask for flex * more fixes * patch create_causal_mask for packing w flex * global torch manual_seed fixture * tweak loss checks * fix patch and use single batch for flex * don't need to reload * fix causal mask patch * use transformers patch releasE * make sure env var is string * make sure to drop attention mask for flex w packing for latest transformers patch release * tweak loss * guard on signature columns before removing attention mask * bump loss * set remove isn't chainable * skip slow mistral test in 2.5.1	2025-07-07 09:35:22 -04:00

1 2 3 4 5 ...

383 Commits