axolotl

Author	SHA1	Message	Date
Wing Lian	7ed40f1d70	automatically set env vars for single gpu deepspeed zero3 (#3118 ) [skip ci] * automatically set env vars for single gpu deepspeed zero3 * use setdefault	2025-08-29 13:36:47 -04:00
VED	5b6ec2820f	patch for ds_grads_remaining in deepspeed (#3102 ) [skip ci] * patch deepspeed * deepspeed patch for ds_grads_remaining * patch in Patchmanager * chore: lint * deepseed utils * chore2 * patch ds_grads_remaining chore * chore lint * chore lint * remove torch.nn patch * lint * Update src/axolotl/monkeypatch/utils.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * patched with checkpointwarapper * lint * only apply deepspeed patch when using activation offloading --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-29 12:12:09 -04:00
Wing Lian	6afba3871d	Add support for PyTorch 2.8.0 (#3106 ) * Add support for PyTorch 2.8.0 * loosen triton requirements * handle torch 2.8.0 in setup.py * fix versions * no vllm for torch 2.8.0 * remove comment Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-28 09:10:40 -04:00
Dan Saunders	dc338c3b0e	Update .coderabbit.yaml (#3109 ) [skip ci] Oops, should be false.	2025-08-27 09:50:52 -04:00
salman	d0d2fc5606	Tokens per second logging [skip-e2e] (#3072 )	2025-08-27 09:10:14 +01:00
Wing Lian	e1131e9619	make always skip_move_to_device default as true (#3084 )	2025-08-26 09:30:22 -04:00
Wing Lian	c4c4b90638	add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json (#3093 ) * add tokenizer_save_jinja_files to keep legacy behavior of including chat template in tokenizer_config.json * fix test import	2025-08-26 09:30:04 -04:00
Wing Lian	0e9945e3b9	deploy training jobs to baseten w truss in axolotl cli (#3086 ) [skip ci] * deploy training jobs to baseten w truss in axolotl cli * cleanup	2025-08-26 09:29:50 -04:00
NanoCode012	0de254a0d0	feat: add gemma3_text attention handling for lora kernels (#3103 )	2025-08-26 16:47:26 +07:00
Dan Saunders	79ddaebe9a	Add ruff, remove black, isort, flake8, pylint (#3092 ) * black, isort, flake8 -> ruff * remove unused * add back needed import * fix	2025-08-23 23:37:33 -04:00
Dan Saunders	eea7a006e1	make multipack sampler patch explicit (#3096 ) * make multipack sampler patch explicit * combining	2025-08-22 14:29:10 -04:00
Wing Lian	ab4d604a8f	upgrade peft for 0.17.1 (#3094 ) * upgrade peft to 0.17.1 * upgrade for transformers too	2025-08-22 07:26:30 -04:00
Wing Lian	0fa752e58b	upgrade flash-attn to 2.8.3 for gpt-oss attn sink support (#3082 )	2025-08-21 15:04:10 -04:00
Dan Saunders	08e517ea48	Update .coderabbit.yaml (#3091 ) [skip ci]	2025-08-20 22:14:13 -04:00
Wing Lian	07fd22f39b	better handling of lora w bias with fsdp2 and handling of files when saving model checkpoint (#3090 )	2025-08-20 15:17:48 -04:00
Wing Lian	06eaf6c448	misc fixes (#3085 )	2025-08-20 08:52:26 -04:00
goggle	050210e637	fix: Sweep runs overwrite each other because output_dir from base config is reused (#3080 ) * refactor: improve output_dir handling in generate_config_files * fix typo * cli: harden sweep output_dir handling with base fallback - Ensure sweep permutations always resolve a valid output_dir - Default to ./model-out if neither permutation nor base config sets output_dir - Append sweepXXXX suffix consistently for each permutation - Prevent Path(None) TypeError and improve robustness of sweep config generation * fix typo * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-19 20:25:20 -04:00
Wing Lian	05cedbfb1e	add baseten info for gpt-oss recipe (#3078 ) * add bsaeten info for gpt-oss recipe * incorporate PR review	2025-08-19 13:30:37 -04:00
VED	c10eb811fa	data_parallel_size in in VllmserveCliArgs (#3074 ) * data_parallel_size in in VllmserveCliArgs * moved to 43	2025-08-18 08:44:37 -04:00
VED	0eef385b1a	[feat] truncation support with excess_length_strategy (#3068 ) [skip ci] * feat:truncation support with excess_len * pre-commit * excess_length_strategy * requested changes * lint * added handle_long_seq_in_dataset in sft * comments improved	2025-08-18 08:39:13 -04:00
Wing Lian	ecbe8b2b61	[GPT-OSS] improve FSDP shard merging and documentation for GPT-OSS (#3073 ) * improve fsdp shard merging * improve logging * update information on merging and inferencing GPT-OSS * cleanup readme * automate cleanup of FSDP prefix * import GRPO only if necessary * only modify config.json on rank0 * merge final checkpoint at end of training * prevent circular import * Fix saving for sharded state dict * devx, move merged to output dir * move import back to top * Fix stuck merge * fix conditionals from pr feedback and add test	2025-08-15 21:25:01 -04:00
Wing Lian	130ef7c51a	Various fixes for VLMs (#3063 ) * fix to not use batch feature indexing * more vlm fixes * use AutoModelForImageTextToText * add example yaml and need num2words for chat template * improve handling of adding image tokens to conversation * add lfm2-vl support * update the lfm readme * fix markdown and add rtol for loss checks * feat: add smolvlm2 processing strat * fix: check for causal-conv1d in lfm models * feat: add docs for lfm2 * feat: add new models and tips to docs * feat: add smolvlm2 docs and remove extra dep * chore: update docs * feat: add video instructions * chore: cleanup * chore: comments * fix: typo * feat: add usage stats * chore: refactor --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-15 10:52:57 -04:00
salman	d1de6f5f3d	Add option to skip slow tests in PRs (#3060 ) [skip ci] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * testing e2e skip [skip-e2e] * stop running multigpu [skip-e2e] * should work now [skip-e2e] * reverting [skip-e2e] * testing [skip-e2e] * debug [skip-e2e] * debug [skip-e2e] * round 2[skip-e2e] * removing debug [skip-e2e] * support skipping whole PR [skip-e2e] * use script for e2e skip [skip-e2e] * contributing [skip-e2e] * contributing [skip-e2e] --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-13 22:57:51 -04:00
Wing Lian	48b7ae1677	use updated patch releasE (#3066 )	2025-08-13 21:23:05 -04:00
NanoCode012	506e3a3907	fix: fsdp_config validation being None (#3061 ) [skip ci] * fix: fsdp_config validation being None * fix: handling --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-13 21:21:50 -04:00
Wing Lian	09145de8fa	upgrade transformers==4.55.1 and bitsandbytes==0.47.0 (#3064 ) * upgrade transformers==4.55.1 * also upgrade bnb * remove bnb params4bit patch (upstreamed) * use latest causal-conv1d * fix patching ring-flash-attn with now missing imports --------- Co-authored-by: Dan Saunders <danjsaund@gmail.com>	2025-08-13 19:41:07 -04:00
Wing Lian	e0a2523a3b	Workaround to unblock docs build in main (#3055 ) Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-08-13 11:39:39 +01:00
Wing Lian	3d45620008	remove prepare-from-posids patch (#3052 ) [skip ci]	2025-08-11 09:34:41 -04:00
github-actions[bot]	ce20e838b5	chore: update pre-commit hooks (#3050 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-08-11 09:32:21 -04:00
Wing Lian	d4d84d48af	fix ray train and add fsdp2 smoke test for ray trainer (#3053 ) * add fsdp2 smokle test for ray trainer * fix raytrain with fsdp2	2025-08-11 09:31:54 -04:00
Wing Lian	9b12c05660	use exec instead of subprocess to make ctrl+c nicer for cli (#3044 ) * use exec instead of subprocess to make ctrl+c nicer for cli * change var name to use_exec * simplify to bool * flush std* * patch subprocess as mock in test * fix tests * more test fixes	2025-08-10 20:22:20 -04:00
Wing Lian	686933194e	fix vllm tagging and add cloud images w/o tmux (#3049 ) [skip ci]	2025-08-10 20:21:56 -04:00
Wing Lian	d12b461d19	follow up fix for plugin registration (#3054 ) [skip ci]	2025-08-10 20:21:38 -04:00
Wing Lian	d6b81b3683	update training args check for new defaults (#3051 ) [skip ci] * update training args check for new defaults * skip check for now	2025-08-10 11:26:22 -04:00
Wing Lian	05f1b4b2e8	run monkeypatch tests in seperate runner (#3047 )	2025-08-09 14:34:07 -04:00
Wing Lian	7cfc80ec77	set dev version (#3045 ) [skip ci]	2025-08-08 13:56:53 -04:00
salman	0da6a95efa	Add citation.tff (#3043 ) [skip ci]	2025-08-08 16:18:42 +01:00
Wing Lian	2c8497e489	tag for v0.12.0 release (#3041 ) Some checks failed ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.12.0	2025-08-08 08:24:09 -04:00
NanoCode012	f70d4de8c7	feat(doc): add links to new features on README (#2980 ) [skip ci] * feat(doc): add links to new features on README * fix merge error * remove blurb about older FSDP2 integration * update blog link * chore: update cce commit * feat: update model support into readme * Update README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * chore: lint num spaces --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-08 08:16:43 -04:00
Dan Saunders	0ae06d756d	use nanmean for loss aggregation (CP fix) (#3033 ) * use nanmena for loss aggregation (CP fix) * use regular asserts * small changes to make tests isolate * combining evaluation_loop patches * fix * delete unused * fix check	2025-08-08 08:15:17 -04:00
NanoCode012	2974670bf8	Feat: add arcee (#3028 ) * feat: add arcee * feat: add latest models supported by cce * feat: add arcee example config * chore: lint * fix: typo * feat: change to instruct * feat: add vram usage * Update README.md	2025-08-08 08:09:11 -04:00
Wing Lian	50f2b94d50	add 120b and deepspeed zero3 examples (#3035 ) [skip ci] * add 120b and deepspeed zero3 examples * add a bit of flavor and cleanup gpt oss readme * fix: remove expert vram usage * fix: remove redundant EOS token from eot_tokens * feat: add 120B to docs --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-08 08:04:56 -04:00
Wing Lian	eb2c87b525	Example for Slurm and various fixes (#3038 ) [skip ci] * slurm example and make preprocess play nicely * start slurm if it init file exists * remove incorrect comment * feat: add slurm docs --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-08 08:02:03 -04:00
NanoCode012	4db7f023c6	feat(doc): standardize the axolotl install to a release (#3040 ) [skip ci]	2025-08-08 08:00:26 -04:00
NanoCode012	4273d5cf7e	feat: update nd parallelism readme (#3039 ) Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-08 12:45:36 +01:00
Wing Lian	c5e5aba547	Add 2.8.0 base images and uv images (#3034 )	2025-08-08 02:30:16 -04:00
Wing Lian	9d5c95db6f	Add support for Accelerate CP, ND examples, and fix for parallel config w fsdp (#3019 ) * fix for parallelism config from trainer * fix handling of parallelism_config w accelerate * add todo for removal * update to latest axolotl-contribs-mit for optimizer fix too * synchronize training after checkpoint save * dir spelling * use latest accelerate main * fix to not use partial state parallelism_config * more fixeS * use most recent accelerate fix * fix cpu_ram_efficient_loading to meta devices from rank 0 to prevent CPU RAM oom * improve handling of broadcasting fsdp2 state dict * support for openai chat template with thinking key as the reasoning trace * address PR feedback * refactor to remove dependency on PartialState for parallelism config * bump accelerate, gptoss fixes * limit meta fixes to fsdp2 for now * fixes for gpt oss * fixup examples, don't use cpu-ram-efficient-loading for now * remove problematic barrier * patch parallelism config * reorder comparison * device mesh fixes * make pure CP work * lint	2025-08-07 21:22:15 -04:00
NanoCode012	ca796fb56e	feat(doc): update gpt-oss readme (#3029 ) [skip ci] * feat(doc): update gpt-oss readme * fix: caps * feat: add toolcalling section * feat: add example tool dataset to docs * chore: update	2025-08-07 09:26:42 -04:00
VED	597953bef0	clear cache before clean up (#3031 ) [skip ci] * clear chahe before save_model * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-08-07 09:25:58 -04:00
NanoCode012	39fbd3b2b5	fix: lora kernels for mistral3 (#3027 ) [skip ci]	2025-08-07 09:25:37 -04:00

1 2 3 4 5 ...

2380 Commits