axolotl

Author	SHA1	Message	Date
Wing Lian	28804b82e4	don't create a reference model if grpo beta is 0.0 (#2983 ) [skip ci]	2025-07-27 17:04:42 -04:00
Wing Lian	f7ea140838	TiledMLP support for FSDP2 (#2950 ) * make TiledMLP work with FSDP * cleanup/gc at start of train to prevent large VRAM spike * chore: lint * generic function for non-deepspeed training * unify patch to fix imports * update readme for ALST and add examples * make deepspeed attribute on params check more robust * update with new info from PR review	2025-07-25 07:15:03 -04:00
Wing Lian	460e0f9ed9	improve handling of file lock when content is empty (#2959 )	2025-07-24 16:10:38 -04:00
Wing Lian	e80faea0db	garbage collect on the end of the step if we're going to save a checkpoint (#2971 ) [skip ci]	2025-07-24 16:10:23 -04:00
Wing Lian	0ff2f172ef	Act offload lora fix (#2928 ) [skip ci] * fix activation offloading with lora * update w e2e test * add docs for error	2025-07-24 16:10:04 -04:00
Wing Lian	5f1a4306b0	don't check dataset labels during preprocess for GRPO (#2952 ) [skip ci] * don't check dataset labels during preprocess for GRPO * use enum check per PR feedback	2025-07-22 20:40:44 -04:00
Wing Lian	93709eb5ce	handle refactor upstream for flash attention (#2966 )	2025-07-22 20:40:04 -04:00
Dan Saunders	208fb7b8e7	basic torchao fp8 mixed precision training (#2926 ) * debug * debug * debug * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint * accelerator constructor patch for single-GPU torch fp8 * lint * re-using existing fp8 code * lint * remove accelerate patch now fix in latest release * fix * docs * add fp8 + fsdp2 example * remove unused config * update config * smoke tests * add validator * add 2.7.0 guard for fsdp2 * fix * add config descriptions * add FSDP doc link * nit * set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather * better cfg for smoke tests * add test for accelerate patching * update fp8 validator	2025-07-22 16:27:47 -04:00
Wing Lian	b86a1d47b0	we don't need to call check_dataset_labels when skip_prepare_dataset is set (#2962 ) * we don't need to call check_dataset_labels when skip_prepare_dataset is set * Fix actual bug and revert prior fix * warn and early return instead of raising an error * use error	2025-07-22 10:00:53 -04:00
NanoCode012	dfba881e99	Feat: add gemma3n support (#2852 ) * feat: add gemma3n cce * feat: add sample config * feat: add gemma3n multimodal mode * feat: add audio example * feat: support audio and return pixel values in collator * feat: support unmask only assistant region (gemma3n for now) * feat(doc): add notes for audio loading * feat: add audio support for gemma3n * feat: update examples * feat: add gemma3n to the docs * fix: add link at top * feat(doc): clarify additional requirements * fix: mllama missing aspect ratio * fix: mllama need attention fixes for fa2 * Partially Revert "fix: mllama need attention fixes for fa2" This reverts commit `a0bfdd1777`. * fix: disable FA2 for mllama in vision mode * feat: update configs to use proper attention * fix: support other vision features * feat(doc): clarify requirements for gemma3n	2025-07-22 16:52:15 +07:00
NanoCode012	bc1076d8a2	fix: suppress warning if we enabled skip prepare (#2958 )	2025-07-21 11:42:04 -04:00
Wing Lian	b7e8f66e5a	upstream fixes in cce for dora and tensor paralel support (#2960 ) [skip ci]	2025-07-21 11:41:53 -04:00
Wing Lian	fefb0797ee	better handling for reward function checks for GRPO (#2933 ) [skip ci] * better handling for reward function checks for GRPO * consolidate msg copy	2025-07-21 11:41:15 -04:00
Wing Lian	af8d257aa2	make pad_to_sequence_len default to the same value as sample_packing (#2941 ) [skip ci] * make pad_to_sequence_len default to the same value as sample_packing * remove duplicate validation * fix test * update description meta Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-21 11:40:56 -04:00
Wing Lian	db5f6f4693	limit num_proc when saving datasets to disk (#2948 ) [skip ci] * limit num_proc when saving datasets to disk * enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save * update fixtures with dataset processes since that should never be NoneType * improve reusability for tests	2025-07-21 11:39:38 -04:00
Wing Lian	109d9c7442	make the initial call to tokenizer.pad not spam the console (#2946 ) [skip ci] * make the initial call to tokenizer.pad not spam the console * add guard from feedback * make another common console output less verbose * more logging fixes	2025-07-19 13:53:35 -04:00
Wing Lian	170322a1f0	make sure log level is upper (#2934 )	2025-07-17 15:32:55 -04:00
Wing Lian	5f5ae76213	add validation around cce + chunked_ce (#2932 ) [skip ci] * add validation around cce + chunked_ce * return on end of validation method	2025-07-17 15:32:38 -04:00
Wing Lian	d23f972602	use state for wandb in callbacks (#2930 ) [skip ci]	2025-07-17 15:31:56 -04:00
Wing Lian	8e41317250	don't use include_tokens_per_second for GRPO (#2931 ) [skip ci] * don't use include_tokens_per_second for GRPO * use blocklist instead	2025-07-17 15:31:21 -04:00
Varun Gumma	9f2bb188a4	Improve Dataset Processing Multiprocessing, Sharding, and Qwen Tokenizer Bug Fix. (#2918 ) * Added a feature to save prepared dataset in specified shards, removed limiter on multiprocessing during tokenization, and a bug fix of qwen tokenizer * removed limiters and fixed config variable name * black lint * chore: lint * feat: update handling of dataset_processes --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-07-17 09:47:58 -04:00
Wing Lian	9dde9e1b71	misc fixes 202507 (#2937 ) [skip ci] * misc fixes 202507 * manually handle attn class for llama4	2025-07-17 09:47:45 -04:00
Wing Lian	f2474ef941	bump accelerate to 1.9.0 (#2936 ) [skip ci]	2025-07-17 09:46:43 -04:00
Wing Lian	36cbe13d18	activation offloading with cuda streams doesn't work with LoRA (#2927 )	2025-07-16 11:59:20 -04:00
Wing Lian	2c408b5c5e	Apply generic fused liger ce, cce, and tiledmlp for arbitrary models (#2908 ) * Apply generic fused liger ce for unknown models * fix deepseek liger modeling * generic cce and config tiled mlp to use original mlp and auto detect compute params * fix weight and lint * update warnings * address PR feedback * use lookup for model class prefixes * revert inadvertent change to flash attn verison * remove un-needed pylint annotations * fix import	2025-07-15 22:40:41 -04:00
Wing Lian	942005f526	use modal==1.0.2 for nightlies and for cli (#2925 ) [skip ci] * use modal==1.0.2 for nightlies and for cli * use latest cce fork for upstream changes * increase timeout	2025-07-15 20:31:23 -04:00
Dan Saunders	10ba1622f7	checkpoint model on first step callback (#2906 ) * checkpoint model on first step callback * remove debug * add test cases; update existing tests not to save on first step * move test out of solo * delete * default to False * typo	2025-07-15 15:00:48 -04:00
Wing Lian	d320ef6199	fix for upstream refactor of KwargsForCausalLM (#2911 )	2025-07-15 11:28:41 -04:00
NanoCode012	354eaaf0d3	feat: add call method to mistral tokenizer wrapper (#2898 )	2025-07-14 22:33:35 -04:00
greenhestu	a061446540	Fix: Prevents merging of tool arguments during preprocessing (#2909 )	2025-07-14 22:33:10 -04:00
Wing Lian	cd079b5536	Tensor parallel w DeepSpeed AutoTP (#2574 ) * support for deepspeed autotup * bump to latest deepspeed that supports deepcompile too * add deepcompile support too * fix total steps calculation for TP * setup fixture for tp * update ds config to ensure weights are gathered for checkpoint * fix duplicate validation names * chore: lint	2025-07-14 21:33:48 -04:00
Wing Lian	5cc16040a8	move the plugin post trainer create to the setup trainer (#2907 ) * move the plugin post trainer create to the setup trainer * move post-train plugins to execute-training fn	2025-07-14 20:11:33 -04:00
Wing Lian	38359a8997	allow profiling in mid-training rather from the start (#2899 ) [skip ci] * allow profiling in mid-training rather from the start * simplify based on PR feedback * fix logic, improve saving at end, add tests	2025-07-14 20:11:11 -04:00
Wing Lian	99187cd208	Activation Offloading w CUDA Streams (#2900 ) [skip ci] * use cuda streams for activation offloading * use torch native ops * update cfg schema for streams * fix literal constructor for set * use context for training step so it doesn't affect evals * disable streams * auto gc on eval steps * use activation_offloading config arg * add docs for gradient checkpointing * handle validation for gc/ao * use cuda streams for act offloading * add more validation for AC w/o GC * fix docs * move activation_offloading lower in definition so it doesn't break args/kwargs * fix kd due to import order	2025-07-14 20:10:20 -04:00
Wing Lian	aa684122f1	upgrade peft==0.16.0 and datasets==4.0.0 (#2917 ) [skip ci] * upgrade peft to 0.16.0 * upgrade datasets to 4.0.0 * refactor dupes from merge/rebase * fix check for fsdp1 + sharded_state_dict * use full state dict for ci	2025-07-14 20:09:26 -04:00
Wing Lian	ca4d4ef793	don't init distributed for deepspeed if preprocessing (#2920 ) * don't init distributed for deepspeed if preprocessing * add e2e test to validate preprocess cli with deepspeed * ignore duplicate code for cfg	2025-07-14 14:19:19 -04:00
Dan Saunders	37edbe4999	Remove extra torch.compile call (#2904 ) * debug * debug * debug * moving validation code to transformers * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint	2025-07-14 12:32:45 -04:00
Wing Lian	af92151a7b	FSDP2 fix validation and add tests (#2910 ) * fix validation and add tests * remove debugging and add more tests * remove migrate_fsdp	2025-07-14 09:25:44 -04:00
Wing Lian	5081db7f8a	upgrade trl==0.19.1 (#2892 ) [skip ci] * upgrade trl==0.19.1 * add vllm for tests for grpo * fixes to work with latest trl * need data_parallel_size config too * support for vllm_mode for server / colocate * vllm settings for colocate * relax vllm version * bump min hf hub for latest vllm support * add hints on string literal for vllm mode * use latest transformers 4.53.2 * tweak acceptable loss on flaky test_ds_zero3_packed test * don't run flaky vllm/grpo tests for now	2025-07-14 09:23:42 -04:00
Wing Lian	41664c7c4c	fix ddp for incorrect steps (#2915 ) * fix ddp for incorrect steps * add test	2025-07-14 07:51:16 -04:00
Jiawei Liu	7fb8441e0e	fix: customized dataset with simpo (#2894 ) [skip ci]	2025-07-12 11:40:30 -04:00
salman	d6e4a611e5	FSDP1 -> FSDP2 (#2760 ) * FSDP2 args migration implementation This commit implements the migration to FSDP2 arguments including: - FSDP2 support with LoRA training - DPO integration with FSDP2 - Model loading fixes and refactoring - CPU offloading and PEFT handling - Test updates and CI improvements - Bug fixes for dtype errors and various edge cases	2025-07-12 15:18:01 +01:00
Ed Sealing	eb662557a7	Register Plugins in Ray Workers (#2901 ) [skip ci] * Access plugins in ray cluster * Add comment * chore: lint --------- Co-authored-by: Ed Sealing <ed.sealing@patapsco.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-11 16:59:59 -04:00
Wing Lian	c370d0795c	[doc] Fix docs for text field mapping for completion datasets (#2890 ) * Fix docs for text field mapping for completion datasets * update another reference	2025-07-09 14:52:44 -04:00
Wing Lian	76aeb16156	tiled_mlp supports single gpu (#2891 ) * tiled_mlp supports single gpu * use checkpoint offloading for arctic training * patch torch checkpoint too * support for single gpu zero3 * add linkback to where it was copied from	2025-07-09 12:48:22 -04:00
Wing Lian	7c5ea0010f	bump dev version (#2889 ) [skip ci]	2025-07-09 09:43:42 -04:00
Wing Lian	c6d69d5c1b	release v0.11.0 (#2875 ) Some checks failed ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 126, 12.6.3, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * release v0.11.0 * don't build vllm into release for now * remove 2.5.1 references * smollm3 multipack support * fix ordering of e2e tests	2025-07-09 09:22:35 -04:00
NanoCode012	8c6a6ea6eb	Feat: add devstral model support (#2880 ) [skip ci] * fix: do not add training and training_detail block by default * fixed: magistral docs * fix: address pad adding new fields and use built-in from_openai * feat: try enable multiprocessing * fix: check for keys before deleting attn_mask * feat: add mistral pad test * feat: add tool calling test * feat: add devstral tokenizer tests * fix: comma format * chore: remove unused support_preprocessing as tokenizer is pickable now * chore: update magistral doc * feat: add devstral readme and example * chore: refactor error handling	2025-07-08 11:01:19 -04:00
NanoCode012	78bff4925e	fix: set add_generation_prompt to False when apply chat template (#2859 ) [skip ci]	2025-07-08 11:00:44 -04:00
NanoCode012	b237c8a3f3	chore: update cce commit to include gemma3n fixes (#2881 ) [skip ci]	2025-07-08 10:59:35 -04:00

1 2 3 4 5 ...

1244 Commits