axolotl

Author	SHA1	Message	Date
Dan Saunders	0437c1a4ba	auto-gptq -> gptqmodel	2025-09-26 10:26:44 -04:00
Grant Holmes (Ren)	850c1a5f8d	Add FSDP v2 swap memory support + QLoRA compatibility fixes (#3167 ) Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-09-26 10:23:59 +01:00
Dan Saunders	f9748c4dc5	Cp fix (#3182 ) * patch transformers to allow CP + FA2 * nits * only patch in CP > 1 case	2025-09-25 12:03:50 -04:00
陈华杰	e8b962d47f	feat: support training with JSON string tool arguments (#3136 ) * feat: support training with JSON string tool arguments; fix PyArrow data type inconsistent error * feat: raise error for tool call arguments decode * Add test_chat_templates_tool_call_string_arguments.py Add test for string arguments * fix: change to correct qwen3 tokenizer * fix: update docs to clarify arguments json * chore: lint * fix: duplicate * chore: revert * feat: add error to faq * fix: remove duplicate fixture --------- Co-authored-by: caoqinping <caoqinping@lixiang.com> Co-authored-by: gamersover-blog <1611885128@qq.com> Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-09-25 12:06:21 +07:00
NanoCode012	08d831c3d5	Feat: add qwen3-next (w packing+cce) (#3150 ) * feat: upgrade cce for qwen3-next * feat: add sample qwen3 config * feat: add packing patch for chunk_gated_delta_rule * feat: add qwen3 link * fix: tuple name * feat: add tested qwen3 config * fix: improve log * feat: add patch for fla without packing * fix: remove fla patch for standard mode * feat: enable packing * feat: add qwen3-next tests * chore: move tests	2025-09-23 11:31:15 +07:00
NanoCode012	09959fac70	Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165 ) * feat: update mistral common * feat: add mistral3processor * fix: loading * fix: cast pixel_values to fp32 * fix: image tensor conversion * feat: add FA2 support for pixtral based models * fix: update mistral small 3.1 to use native tokenizer * fix: install tips * fix: improve info on sample dataset files * chore: move mistral configs into subfolders * fix: remove unneeded patch * fix: indent * feat: add integration tests * chore: move * feat: add magistral 2509 docs and example * fix: convert tensor to bool * feat: expand tests * chore: move tests	2025-09-18 15:42:20 +07:00
Dan Saunders	4065bc14c6	Debug log, logging improvements (#3159 ) * simplify logging * remove comment * progress on debug.log * add debug-level logger for file log * simplify * case insensitivity; 3rd party logging improvements * simplify * fix * tests * lint * nits * nit * Update tests/test_utils_tee.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> * cleanup / comments * fix * oops --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2025-09-17 13:27:03 -04:00
Wing Lian	86d6ee7c05	upgrade trl and accelerate (#3161 ) * upgrade trl==0.23.0 * upgrade accelerate patch fix * add hints when using gradient_checkpointing with DPO * set gradient-checpointing properly	2025-09-16 14:53:01 -04:00
salman	58d67bf98d	Migrate QAT API; fix `axolotl quantize` for QAT-ed models; add NVFP4 (#3107 )	2025-09-12 10:55:50 +01:00
Dan Saunders	1b53c49e1a	text diffusion training plugin (#3067 ) * diffusion training plugin * cleanup * nits * fixes + improvements * add back in reinit_weights (clobbered?); masking / pretrain fixes * nits * cleanup; tests draft * sample generation, tests fixes * fixes * nits * add inference support; add auto-mask token support * nits * nits * progress * simplify logging * lint * prefix args with diffusion_ * coderabbito * tests fix * nit * nits * cleanup + nits * nits * fix SFT sample gen * fixes * fix * comments * comments * lint * reward model lora fix * cleanup; fix pretraining_dataset case * gradio inference * update cfgs * update cfgs * train, generation parity, cleanup * fix * simplify * test * test fix	2025-09-10 20:27:00 -04:00
NanoCode012	1d32278755	feat: upgrade transformers to v4.56.1 (#3127 ) * feat: upgrade transformers to v4.56 * fix handling of CP/SP now that position_ids are default even for unpacked sequences * feat: monkeypatch list_repo_templates * fix: apply patch for tests only * see if updated main works at least * fix: update to patch release and remove monkeypatch * remove fsdp2 eval patch --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-09-05 11:00:54 -04:00
Dan Saunders	231a67e70b	Streaming SFT support (#3101 ) * working * fixes * deprecate --iterable; cleanup * pretrain_multipack_buffer_size -> streaming_multipack_buffer_size * improvements * tests * remove unused * docs, examples * nit * nit * add val_set_size validation * val * nit * min * coderabbito * cleanup * nit * add depr warning, cleanup * nit * fix test, fix quarto * fix * review comments * review comments * fix	2025-09-02 12:08:44 -04:00
Wing Lian	c4c4b90638	add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json (#3093 ) * add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json * fix test import	2025-08-26 09:30:04 -04:00
Dan Saunders	79ddaebe9a	Add ruff, remove black, isort, flake8, pylint (#3092 ) * black, isort, flake8 -> ruff * remove unused * add back needed import * fix	2025-08-23 23:37:33 -04:00
Dan Saunders	eea7a006e1	make multipack sampler patch explicit (#3096 ) * make multipack sampler patch explicit * combining	2025-08-22 14:29:10 -04:00
VED	0eef385b1a	[feat] truncation support with excess_length_strategy (#3068 ) [skip ci] * feat:truncation support with excess_len * pre-commit * excess_length_strategy * requested changes * lint * added handle_long_seq_in_dataset in sft * comments improved	2025-08-18 08:39:13 -04:00
Wing Lian	ecbe8b2b61	[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 ) * improve fsdp shard merging * improve logging * update information on merging and inferencing GPT-OSS * cleanup readme * automate cleanup of FSDP prefix * import GRPO only if necessary * only modify config.json on rank0 * merge final checkpoint at end of training * prevent circular import * Fix saving for sharded state dict * devx, move merged to output dir * move import back to top * Fix stuck merge * fix conditionals from pr feedback and add test	2025-08-15 21:25:01 -04:00
Wing Lian	130ef7c51a	Various fixes for VLMs (#3063 ) * fix to not use batch feature indexing * more vlm fixes * use AutoModelForImageTextToText * add example yaml and need num2words for chat template * improve handling of adding image tokens to conversation * add lfm2-vl support * update the lfm readme * fix markdown and add rtol for loss checks * feat: add smolvlm2 processing strat * fix: check for causal-conv1d in lfm models * feat: add docs for lfm2 * feat: add new models and tips to docs * feat: add smolvlm2 docs and remove extra dep * chore: update docs * feat: add video instructions * chore: cleanup * chore: comments * fix: typo * feat: add usage stats * chore: refactor --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-15 10:52:57 -04:00
Wing Lian	09145de8fa	upgrade transformers==4.55.1 and bitsandbytes==0.47.0 (#3064 ) * upgrade transformers==4.55.1 * also upgrade bnb * remove bnb params4bit patch (upstreamed) * use latest causal-conv1d * fix patching ring-flash-attn with now missing imports --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-08-13 19:41:07 -04:00
Wing Lian	d4d84d48af	fix ray train and add fsdp2 smoke test for ray trainer (#3053 ) * add fsdp2 smokle test for ray trainer * fix raytrain with fsdp2	2025-08-11 09:31:54 -04:00
Wing Lian	9b12c05660	use exec instead of subprocess to make ctrl+c nicer for cli (#3044 ) * use exec instead of subprocess to make ctrl+c nicer for cli * change var name to use_exec * simplify to bool * flush std* * patch subprocess as mock in test * fix tests * more test fixes	2025-08-10 20:22:20 -04:00
Wing Lian	d6b81b3683	update training args check for new defaults (#3051 ) [skip ci] * update training args check for new defaults * skip check for now	2025-08-10 11:26:22 -04:00
Dan Saunders	0ae06d756d	use nanmean for loss aggregation (CP fix) (#3033 ) * use nanmena for loss aggregation (CP fix) * use regular asserts * small changes to make tests isolate * combining evaluation_loop patches * fix * delete unused * fix check	2025-08-08 08:15:17 -04:00
Wing Lian	9d5c95db6f	Add support for Accelerate CP, ND examples, and fix for parallel config w fsdp (#3019 ) * fix for parallelism config from trainer * fix handling of parallelism_config w accelerate * add todo for removal * update to latest axolotl-contribs-mit for optimizer fix too * synchronize training after checkpoint save * dir spelling * use latest accelerate main * fix to not use partial state parallelism_config * more fixeS * use most recent accelerate fix * fix cpu_ram_efficient_loading to meta devices from rank 0 to prevent CPU RAM oom * improve handling of broadcasting fsdp2 state dict * support for openai chat template with thinking key as the reasoning trace * address PR feedback * refactor to remove dependency on PartialState for parallelism config * bump accelerate, gptoss fixes * limit meta fixes to fsdp2 for now * fixes for gpt oss * fixup examples, don't use cpu-ram-efficient-loading for now * remove problematic barrier * patch parallelism config * reorder comparison * device mesh fixes * make pure CP work * lint	2025-08-07 21:22:15 -04:00
Wing Lian	4bce713b39	allow custom trainer_cls to be defined as a module reference in the YAML (#3024 ) [skip ci] * allow custom trainer_cls to be defined as a module reference in the YAML * address PR feedback and add test * add tests	2025-08-06 22:49:19 -04:00
Dan Saunders	d09290f2f4	Lora kernels bias support (#3025 ) * lora kernels bias support * revert rename * nit * lint, tests * satisfying the rabbit	2025-08-06 20:20:08 -04:00
Wing Lian	97e86c6d47	drop old patches and code that are no longer needed (#3007 ) [skip ci]	2025-08-06 08:02:39 -04:00
Wing Lian	ab49d16e34	Dion optimizer support (#3014 ) * Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit	2025-08-04 16:33:30 -04:00
Dan Saunders	e758343cac	FSDP2 + LoRA kernels (#2992 ) * impl fix * smoke tests * patches for fsdp2 + qlora compat * nit * working fix * working fix * fix merge * minifying patches; update bnb dep * renaming; adding tests * remove duplicate test, add dora guard * generalize __torch_function__ * revert generalization * update comments	2025-08-03 20:05:17 -04:00
salman	294c7fe7a6	Distributed/ND-Parallel (#2977 )	2025-07-31 15:25:02 -04:00
Wing Lian	7b68dfafd7	jagged lr restart scheudler (#1680 ) [skip ci] * jagged lr restart scheudler var name fix make sure to create scheduler first * wire things together * more fixes * fix for nesting scheduler and first anneal phase * no need for relora trainer anymore since we've generalized the relora scheduler * remove redundant relora scheduler and lint * update relora e2e test for updated params * need restart steps for relora test * update quarto docs for dropped relora trainer * update example yaml * drop verbose arg * min lr scale support for jagged lr * don't let min_lr be nonetype * cleanup args	2025-07-31 13:50:03 -04:00
Wing Lian	563f5eed7a	update dependencies - liger + trl (#2987 ) * update dependencies * set dataset processes for tests * add support for GSPO	2025-07-31 11:17:17 -04:00
Dan Saunders	bb1cae1a20	CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 ) * add --launcher option; explicit True/False bool args; small cleanup * refactor * add torchrun, accelerate cli args * add rdzv arg default + tests * update _quarto * coderabbit * fix * we can't set rdvz_id independently across nodes * coderabbit * fix tests	2025-07-30 15:46:56 -04:00
NanoCode012	90e5598930	Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes (#2979 ) * fix: lock version in gemma3n docs * feat: add sample configs and docs * chore: move mistraltokenizer into mistral folder * feat: update instructions * feat: add dynamic load voxtral * fix: remove incorrect vision config, add audio * fix: support voxtral processing strategy and address none in data * feat: patch mistraltokenizer subclass upstream and add missing * feat: update cce commit to include voxtral * fix: remove old comment * fix: gemma3 patch not needed anymore * fix: voxtral modeling code * fix: remove incorrect ds path * fix: adjust apply chat template parsing * feat: enable voxtral patch * fix: patch * feat: update example datasets * fix: target layer * feat: update gemma3n docs * feat: update voxtral docs * feat: revert assistant parsing to rely on new upstream changes * chore: skip test till next PR fix * fix: override upstream decode due to missing handling * feat: update readme * fix: update * feat: add magistral small think support * feat: update mistral-common dep * fix: lint * fix: remove optional dep * chore: typing * chore: simply import * feat(doc): update differences for 2507 * fix: coderrabbit comments * feat: update clarify docs on new transformers	2025-07-30 15:57:05 +07:00
Wing Lian	0ff2f172ef	Act offload lora fix (#2928 ) [skip ci] * fix activation offloading with lora * update w e2e test * add docs for error	2025-07-24 16:10:04 -04:00
Dan Saunders	208fb7b8e7	basic torchao fp8 mixed precision training (#2926 ) * debug * debug * debug * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint * accelerator constructor patch for single-GPU torch fp8 * lint * re-using existing fp8 code * lint * remove accelerate patch now fix in latest release * fix * docs * add fp8 + fsdp2 example * remove unused config * update config * smoke tests * add validator * add 2.7.0 guard for fsdp2 * fix * add config descriptions * add FSDP doc link * nit * set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather * better cfg for smoke tests * add test for accelerate patching * update fp8 validator	2025-07-22 16:27:47 -04:00
Wing Lian	af8d257aa2	make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci] * make pad_to_sequence_len default to the same value as sample_packing * remove duplicate validation * fix test * update description meta Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-21 11:40:56 -04:00
Wing Lian	db5f6f4693	limit num_proc when saving datasets to disk (#2948 ) [skip ci] * limit num_proc when saving datasets to disk * enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save * update fixtures with dataset processes since that should never be NoneType * improve reusability for tests	2025-07-21 11:39:38 -04:00
Wing Lian	36cbe13d18	activation offloading with cuda streams doesn't work with LoRA (#2927 )	2025-07-16 11:59:20 -04:00
Dan Saunders	10ba1622f7	checkpoint model on first step callback (#2906 ) * checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo	2025-07-15 15:00:48 -04:00
NanoCode012	354eaaf0d3	feat: add call method to mistral tokenizer wrapper (#2898 )	2025-07-14 22:33:35 -04:00
greenhestu	a061446540	Fix: Prevents merging of tool arguments during preprocessing (#2909 )	2025-07-14 22:33:10 -04:00
Wing Lian	cd079b5536	Tensor parallel w DeepSpeed AutoTP (#2574 ) * support for deepspeed autotup * bump to latest deepspeed that supports deepcompile too * add deepcompile support too * fix total steps calculation for TP * setup fixture for tp * update ds config to ensure weights are gathered for checkpoint * fix duplicate validation names * chore: lint	2025-07-14 21:33:48 -04:00
Wing Lian	38359a8997	allow profiling in mid-training rather from the start (#2899 ) [skip ci] * allow profiling in mid-training rather from the start * simplify based on PR feedback * fix logic, improve saving at end, add tests	2025-07-14 20:11:11 -04:00
Wing Lian	aa684122f1	upgrade peft==0.16.0 and datasets==4.0.0 (#2917 ) [skip ci] * upgrade peft to 0.16.0 * upgrade datasets to 4.0.0 * refactor dupes from merge/rebase * fix check for fsdp1 + sharded_state_dict * use full state dict for ci	2025-07-14 20:09:26 -04:00
Wing Lian	ca4d4ef793	don't init distributed for deepspeed if preprocessing (#2920 ) * don't init distributed for deepspeed if preprocessing * add e2e test to validate preprocess cli with deepspeed * ignore duplicate code for cfg	2025-07-14 14:19:19 -04:00
Wing Lian	e581c15d40	refactor dupes from merge/rebase (#2919 ) [skip ci]	2025-07-14 10:05:26 -04:00
Wing Lian	af92151a7b	FSDP2 fix validation and add tests (#2910 ) * fix validation and add tests * remove debugging and add more tests * remove migrate_fsdp	2025-07-14 09:25:44 -04:00
Wing Lian	5081db7f8a	upgrade trl==0.19.1 (#2892 ) [skip ci] * upgrade trl==0.19.1 * add vllm for tests for grpo * fixes to work with latest trl * need data_parallel_size config too * support for vllm_mode for server / colocate * vllm settings for colocate * relax vllm version * bump min hf hub for latest vllm support * add hints on string literal for vllm mode * use latest transformers 4.53.2 * tweak acceptable loss on flaky test_ds_zero3_packed test * don't run flaky vllm/grpo tests for now	2025-07-14 09:23:42 -04:00
Wing Lian	41664c7c4c	fix ddp for incorrect steps (#2915 ) * fix ddp for incorrect steps * add test	2025-07-14 07:51:16 -04:00

1 2 3 4 5 ...

390 Commits