axolotl

Author	SHA1	Message	Date
NanoCode012	631268a0ca	revert renaming of deepspeed stage3 args that use auto (#2964 ) [skip ci] * Revert "fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg…" This reverts commit `e207762928`. * don't revert the values that don't use 'auto' --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-22 09:59:47 -04:00
Wing Lian	3a208cfd84	Autocomplete axolotl CLI (#2955 ) * static autocomplete script for axolotl cli * use list of commands that should autocomplete yaml files * make sure to chmod the autocomplete script as executable * shellcheck and fix autocompletion of directory/sub-dirs * more shellcheck fixes	2025-07-22 08:30:31 -04:00
github-actions[bot]	7267edc168	chore: update pre-commit hooks (#2954 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-07-22 08:30:00 -04:00
NanoCode012	dfba881e99	Feat: add gemma3n support (#2852 ) * feat: add gemma3n cce * feat: add sample config * feat: add gemma3n multimodal mode * feat: add audio example * feat: support audio and return pixel values in collator * feat: support unmask only assistant region (gemma3n for now) * feat(doc): add notes for audio loading * feat: add audio support for gemma3n * feat: update examples * feat: add gemma3n to the docs * fix: add link at top * feat(doc): clarify additional requirements * fix: mllama missing aspect ratio * fix: mllama need attention fixes for fa2 * Partially Revert "fix: mllama need attention fixes for fa2" This reverts commit `a0bfdd1777`. * fix: disable FA2 for mllama in vision mode * feat: update configs to use proper attention * fix: support other vision features * feat(doc): clarify requirements for gemma3n	2025-07-22 16:52:15 +07:00
Wing Lian	d32058e149	include torchvision in build for upstream changes requiring it now (#2953 ) [skip ci]	2025-07-22 04:19:16 -04:00
NanoCode012	bc1076d8a2	fix: suppress warning if we enabled skip prepare (#2958 )	2025-07-21 11:42:04 -04:00
Wing Lian	b7e8f66e5a	upstream fixes in cce for dora and tensor paralel support (#2960 ) [skip ci]	2025-07-21 11:41:53 -04:00
Wing Lian	e207762928	fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg (#2956 ) [skip ci] * fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg * replace the rest of the migrated deepspeed params	2025-07-21 11:41:31 -04:00
Wing Lian	fefb0797ee	better handling for reward function checks for GRPO (#2933 ) [skip ci] * better handling for reward function checks for GRPO * consolidate msg copy	2025-07-21 11:41:15 -04:00
Wing Lian	af8d257aa2	make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci] * make pad_to_sequence_len default to the same value as sample_packing * remove duplicate validation * fix test * update description meta Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-21 11:40:56 -04:00
Wing Lian	db5f6f4693	limit num_proc when saving datasets to disk (#2948 ) [skip ci] * limit num_proc when saving datasets to disk * enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save * update fixtures with dataset processes since that should never be NoneType * improve reusability for tests	2025-07-21 11:39:38 -04:00
Wing Lian	8e5f146701	Fix cloud docker image build and remove apt files for optim (#2961 ) * make sure to apt update to install sudo and tmux * remove apt archives too	2025-07-21 11:05:00 -04:00
Wing Lian	31a15a49b6	add additional packages via apt for better multi-node support (#2949 ) * cleanup in Dockerfile and add infiniband packages * fixes for ci * fix nightly too	2025-07-20 21:19:23 -04:00
NanoCode012	b986f7c7cb	fix: return proper attention for llama4 lora kernel and fsdp2 llama4 example fix (#2943 ) * fix: return proper attention for llama4 lora optim * fix: update fsdp2 llama4 config	2025-07-19 13:54:43 -04:00
salman	e5734e5cf0	adding torchtitan link (#2945 ) [skip ci]	2025-07-19 13:54:14 -04:00
Wing Lian	109d9c7442	make the initial call to tokenizer.pad not spam the console (#2946 ) [skip ci] * make the initial call to tokenizer.pad not spam the console * add guard from feedback * make another common console output less verbose * more logging fixes	2025-07-19 13:53:35 -04:00
Wing Lian	170322a1f0	make sure log level is upper (#2934 )	2025-07-17 15:32:55 -04:00
Wing Lian	5f5ae76213	add validation around cce + chunked_ce (#2932 ) [skip ci] * add validation around cce + chunked_ce * return on end of validation method	2025-07-17 15:32:38 -04:00
Wing Lian	a798975b7c	coderabbit manual settings (#2940 ) [skip ci]	2025-07-17 15:32:16 -04:00
Wing Lian	d23f972602	use state for wandb in callbacks (#2930 ) [skip ci]	2025-07-17 15:31:56 -04:00
Wing Lian	8e41317250	don't use include_tokens_per_second for GRPO (#2931 ) [skip ci] * don't use include_tokens_per_second for GRPO * use blocklist instead	2025-07-17 15:31:21 -04:00
Varun Gumma	9f2bb188a4	Improve Dataset Processing Multiprocessing, Sharding, and Qwen Tokenizer Bug Fix. (#2918 ) * Added a feature to save prepared dataset in specified shards, removed limiter on multiprocessing during tokenization, and a bug fix of qwen tokenizer * removed limiters and fixed config variable name * black lint * chore: lint * feat: update handling of dataset_processes --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-17 09:47:58 -04:00
Wing Lian	9dde9e1b71	misc fixes 202507 (#2937 ) [skip ci] * misc fixes 202507 * manually handle attn class for llama4	2025-07-17 09:47:45 -04:00
Wing Lian	f2474ef941	bump accelerate to 1.9.0 (#2936 ) [skip ci]	2025-07-17 09:46:43 -04:00
Wing Lian	8a4bcacdb2	cu126-torch271 for cloud docker image should be tagged with main-latest (#2935 )	2025-07-17 00:01:23 -04:00
Wing Lian	d2c3d5a954	run nightly-vs-upstream-main on 2.7.1 and multi-gpu also (#2929 ) [skip ci]	2025-07-16 21:45:42 -04:00
Wing Lian	36cbe13d18	activation offloading with cuda streams doesn't work with LoRA (#2927 )	2025-07-16 11:59:20 -04:00
Wing Lian	2c408b5c5e	Apply generic fused liger ce, cce, and tiledmlp for arbitrary models (#2908 ) * Apply generic fused liger ce for unknown models * fix deepseek liger modeling * generic cce and config tiled mlp to use original mlp and auto detect compute params * fix weight and lint * update warnings * address PR feedback * use lookup for model class prefixes * revert inadvertent change to flash attn verison * remove un-needed pylint annotations * fix import	2025-07-15 22:40:41 -04:00
Wing Lian	942005f526	use modal==1.0.2 for nightlies and for cli (#2925 ) [skip ci] * use modal==1.0.2 for nightlies and for cli * use latest cce fork for upstream changes * increase timeout	2025-07-15 20:31:23 -04:00
Dan Saunders	10ba1622f7	checkpoint model on first step callback (#2906 ) * checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo	2025-07-15 15:00:48 -04:00
Wing Lian	d320ef6199	fix for upstream refactor of KwargsForCausalLM (#2911 )	2025-07-15 11:28:41 -04:00
NanoCode012	354eaaf0d3	feat: add call method to mistral tokenizer wrapper (#2898 )	2025-07-14 22:33:35 -04:00
greenhestu	a061446540	Fix: Prevents merging of tool arguments during preprocessing (#2909 )	2025-07-14 22:33:10 -04:00
Wing Lian	cd079b5536	Tensor parallel w DeepSpeed AutoTP (#2574 ) * support for deepspeed autotup * bump to latest deepspeed that supports deepcompile too * add deepcompile support too * fix total steps calculation for TP * setup fixture for tp * update ds config to ensure weights are gathered for checkpoint * fix duplicate validation names * chore: lint	2025-07-14 21:33:48 -04:00
Wing Lian	5cc16040a8	move the plugin post trainer create to the setup trainer (#2907 ) * move the plugin post trainer create to the setup trainer * move post-train plugins to execute-training fn	2025-07-14 20:11:33 -04:00
Wing Lian	38359a8997	allow profiling in mid-training rather from the start (#2899 ) [skip ci] * allow profiling in mid-training rather from the start * simplify based on PR feedback * fix logic, improve saving at end, add tests	2025-07-14 20:11:11 -04:00
Wing Lian	7dc3ac6cb3	update nightlies builds (#2921 ) [skip ci]	2025-07-14 20:10:43 -04:00
Wing Lian	99187cd208	Activation Offloading w CUDA Streams (#2900 ) [skip ci] * use cuda streams for activation offloading * use torch native ops * update cfg schema for streams * fix literal constructor for set * use context for training step so it doesn't affect evals * disable streams * auto gc on eval steps * use activation_offloading config arg * add docs for gradient checkpointing * handle validation for gc/ao * use cuda streams for act offloading * add more validation for AC w/o GC * fix docs * move activation_offloading lower in definition so it doesn't break args/kwargs * fix kd due to import order	2025-07-14 20:10:20 -04:00
Wing Lian	aa684122f1	upgrade peft==0.16.0 and datasets==4.0.0 (#2917 ) [skip ci] * upgrade peft to 0.16.0 * upgrade datasets to 4.0.0 * refactor dupes from merge/rebase * fix check for fsdp1 + sharded_state_dict * use full state dict for ci	2025-07-14 20:09:26 -04:00
Wing Lian	ca4d4ef793	don't init distributed for deepspeed if preprocessing (#2920 ) * don't init distributed for deepspeed if preprocessing * add e2e test to validate preprocess cli with deepspeed * ignore duplicate code for cfg	2025-07-14 14:19:19 -04:00
Dan Saunders	37edbe4999	Remove extra torch.compile call (#2904 ) * debug * debug * debug * moving validation code to transformers * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint	2025-07-14 12:32:45 -04:00
Wing Lian	e581c15d40	refactor dupes from merge/rebase (#2919 ) [skip ci]	2025-07-14 10:05:26 -04:00
Wing Lian	af92151a7b	FSDP2 fix validation and add tests (#2910 ) * fix validation and add tests * remove debugging and add more tests * remove migrate_fsdp	2025-07-14 09:25:44 -04:00
Wing Lian	80dc4c261a	fix xformers version for python 2.6 (#2916 ) [skip ci]	2025-07-14 09:24:29 -04:00
Wing Lian	7ccbbd8e77	upgrade liger to 0.6.0 (#2893 ) [skip ci]	2025-07-14 09:24:07 -04:00
Wing Lian	5081db7f8a	upgrade trl==0.19.1 (#2892 ) [skip ci] * upgrade trl==0.19.1 * add vllm for tests for grpo * fixes to work with latest trl * need data_parallel_size config too * support for vllm_mode for server / colocate * vllm settings for colocate * relax vllm version * bump min hf hub for latest vllm support * add hints on string literal for vllm mode * use latest transformers 4.53.2 * tweak acceptable loss on flaky test_ds_zero3_packed test * don't run flaky vllm/grpo tests for now	2025-07-14 09:23:42 -04:00
Wing Lian	41664c7c4c	fix ddp for incorrect steps (#2915 ) * fix ddp for incorrect steps * add test	2025-07-14 07:51:16 -04:00
Wing Lian	9a8073e73d	Liquid Foundation Model 2 support (#2905 ) * LFM2 support * docs * packing seems to work * update install to force install in case already on dev version * default to use chunked cross entropy	2025-07-12 11:41:34 -04:00
Jiawei Liu	7fb8441e0e	fix: customized dataset with simpo (#2894 ) [skip ci]	2025-07-12 11:40:30 -04:00
NanoCode012	4dc5910e1c	feat(doc): re-add docker 2.7.0 tag back (#2902 ) [skip ci]	2025-07-12 11:40:01 -04:00

1 2 3 4 5 ...

2278 Commits