axolotl

Author	SHA1	Message	Date
Wing Lian	d8cf66edbd	use fork for multiprocess start method for packing in parallel (#2830 )	2025-06-25 13:17:33 -04:00
NanoCode012	181cc3106b	fix: catch httperror from ratelimiting hf when checking user token (#2827 )	2025-06-25 09:50:13 -04:00
NanoCode012	20106116da	fix: 'NoneType' object has no attribute 'column_names' (#2822 ) [skip ci] * fix: 'NoneType' object has no attribute 'column_names' * chore: typing	2025-06-25 09:49:55 -04:00
Younes B	a27c4f8771	feat: add falcon-h1 into axolotl (#2811 ) [skip ci] * feat: add falcon-h1 into axolotl * fix pre-commit * review * fix: remove packing	2025-06-25 09:49:42 -04:00
NanoCode012	bb1109b81d	feat: update CCE to use axolotl's fork (#2813 ) [skip ci] * feat: update CCE to use axolotl's fork * chore: improve error message * feat: add eot token for gemma3 configs * fix: only warn on more than 1 image * fix: re-add gemma3 patch * Revert "fix: re-add gemma3 patch" This reverts commit `f04db5e873`. * feat: add qwen25 vl example * feat: point to upstream fork cce package * feat: update cce commit	2025-06-25 09:49:22 -04:00
Dan Saunders	8c69ec3a1e	gating _gather_outputs (causes increased vram usage) (#2829 ) * SP vram fix * gating _gather_outputs (causes increased vram usage) * reverting unneeded change	2025-06-25 08:33:55 -04:00
Dan Saunders	46675496a3	log config (#2819 ) * log config * moving text art; adding sensitive value redaction + sorting * revert pre-commit changes * remove none-valued config before dumping * just redact api keys	2025-06-24 14:59:30 -04:00
NanoCode012	c6b5d35e5d	fix: re-add gemma3 patch (#2817 )	2025-06-24 10:51:30 +07:00
Wing Lian	12c826816d	chunked cross entropy loss (#2625 ) * chunked cross entropy loss * refactor so we can add test * use relative import * update schema description	2025-06-23 23:08:46 -04:00
Dan Saunders	1d8f500709	deepspeed fix (#2820 )	2025-06-23 09:07:57 -04:00
Wing Lian	0494359c6c	update trl to 0.18.2 (#2814 )	2025-06-19 11:27:59 -04:00
NanoCode012	26c39e1ca7	fix(doc): address exitcode formatting to help search (#2809 ) [skip ci]	2025-06-19 11:19:52 -04:00
Dan Saunders	45adf1bfb9	get_logger use_environ fix (#2808 ) * get_logger use_environ fix * rethinking * replacing old logger imports * simplify * fix boolean cond	2025-06-19 11:16:52 -04:00
Carsten Kragelund Jørgensen	eb3a57eb17	Ignore generation/endgeneration tags when analyzing Jinja chat template (#2787 ) * ignore generation/endgeneration tags Axolotl handles calculating the mask for assistant turns on its own, and as such these tags are not needed, however currently the analyzer does not recognize them at all and throws an error. * feat: add phi4 tokenizer test and unblock gemma2 * fix: improve template * chore: refactor * chore: lint --------- Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-18 15:59:07 -04:00
Wing Lian	34da391391	Set dev version (#2807 ) [skip ci]	2025-06-18 15:49:05 -04:00
NanoCode012	0bb9077553	Fix: logging on py310 (#2802 ) * feat: encourage py311 * fix: logging import on py310 * fix: do upper and simplify handling	2025-06-18 15:46:27 -04:00
Wing Lian	a85efffbef	bump transformers==4.52.4 (#2800 ) [skip ci] * bump transformers==4.52.4 * don't use hf offline for qwen tokenizer * increase timeout * don't use methodtype * increase timeout * better assertion logging * upgrade deepspeed version too	2025-06-18 15:46:14 -04:00
Dan Saunders	06a648263b	Config doc autogen: follow-up fix docs build (#2806 ) * config reference doc autogen * improvements * cleanup; still ugly but working * reformat * remove autogen config ref from git * factor out validations * rewrite * rewrite * cleanup * progress * progress * progress * lint and minifying somewhat * remove unneeded * coderabbit * coderabbit * update preview-docs workflow triggers * installing with deps * coderabbit * update refs * overwrote file accidentally * docs install deps	2025-06-18 15:42:54 -04:00
Dan Saunders	9d5bfc127e	Config doc autogen (#2718 ) * config reference doc autogen * improvements * cleanup; still ugly but working * reformat * remove autogen config ref from git * factor out validations * rewrite * rewrite * cleanup * progress * progress * progress * lint and minifying somewhat * remove unneeded * coderabbit * coderabbit * update preview-docs workflow triggers * installing with deps * coderabbit * update refs * overwrote file accidentally	2025-06-18 15:36:53 -04:00
Wing Lian	da8f6c32b9	update favicon (#2801 ) * update favicon * correct size favicon	2025-06-17 18:09:24 -04:00
Wing Lian	88c0e8d048	release tag (#2799 ) Some checks failed ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 126, 12.6.3, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 128, 12.8.1, 3.11, 2.7.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.6.0) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.10.0	2025-06-17 12:13:27 -04:00
NanoCode012	d8e8cd8558	feat: remove evalfirst callback with built-in trainer arg (#2797 )	2025-06-17 12:09:33 -04:00
Wing Lian	ccc94da8ad	KD fix w/ online distillation (#2700 ) [skip ci] * kd fixes * fix collator setup * fix input args * better handling to drop string fields for kd with raw dataset * kd trainer has kd temp as part of the init * drop top_k before softmax * simplfy and remove zscore * WIP chunked KD loss with autograd wrapper * more fixes and liger-type chunked loss * collator cls for plugins * remove debugging * additional plugin collator kwargs, don't scale up kd loss by t^2 * don't need temp arg to distill method * online kd wip * add close to comment block * suport sampling params/max new tokens * handle when no custom collator is used in plugins * logsumexp trick: * fix check * shift off the first empty token * fix length of padding * use max not min * temp scale kd loss at end * support for dynamic plugin training args mixins and symmetric kl * chore: lint * fix trainer callback base class * Fix decay * accept compressed responses for smaller wire payload * post-rebase lint * more KD updates * increase hyperparams_count for gradients for added normalize_topk * fix to remove attention_mask * rename vars for consistency * fix rebase issues * default to dropping last batch in multipack batch sampler * improve handling of train len * init collator_cls_and_kwargs * explicit drop_last=False when checking for multipack completeness * use separate v2 loader for kd * fix kd tests to use subprocess so it picks up kd training args * default value for kd_beta arg * use updated dataset for ci * longer timeout for e2e	2025-06-17 12:09:13 -04:00
Matt Cummins	ba62aa65ee	fixed the lora_target_modules syntax (#2793 )	2025-06-15 16:47:02 -04:00
NanoCode012	21388cf615	Fix: lora kernel pre-patch applied despite post-patch not applied (#2772 ) * fix: do not pre-patch self attention if lora dropout non-zero * fix: add test to check patch not applied * fix: test * fix: test config check * fix where we check so that tests don't break * fix: test --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-14 11:54:06 -07:00
NanoCode012	80d5b066ec	Fix: adding magistral fsdp config, fixing not eval with test_datasets, handle mllama attention (#2789 ) [skip ci] * feat: add fsdp config for magistral * fix: add mllama self attention handling for lora kernels * fix: no eval if val_set_size 0 despite having test_datasets * fix: add note for cce for vlm in newer model	2025-06-14 11:53:43 -07:00
NanoCode012	a3c82e8cbb	fix: grpo doc link (#2788 ) [skip ci]	2025-06-13 12:03:47 -07:00
Wing Lian	b2274d430b	support for QAT w RL (DPO) (#2776 )	2025-06-13 10:00:35 -04:00
NanoCode012	eac4a61f55	Feat: Add Magistral and mistral-common tokenizer support (#2780 )	2025-06-12 19:18:33 -04:00
Wing Lian	ace9287c96	update loss value for flakey e2e test (#2786 ) [skip ci] * update loss value for flakey e2e test * use pytest skip * parametrize combinations	2025-06-12 18:06:14 -04:00
JZacaroli	f5fbc82f2b	Fix logging import in evaluate.py (#2782 ) (#2783 ) * Fix logging import in evaluate.py (#2782) * chore: lint --------- Co-authored-by: Joe Zacaroli <jaz@cyberscience.com> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-12 13:23:31 -04:00
NanoCode012	706c677cad	feat(doc): update readme to include changelog and remove matrix (#2775 ) [skip ci] * feat(doc): update readme to include changelog and remove matrix * chore: improve wording * chore: wording * Update README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * Update README.md Co-authored-by: salman <salman.mohammadi@outlook.com> * chore: address comment remove muon * chore: address comments * fix: address final comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-06-12 13:23:18 -04:00
Wing Lian	468580d18e	limit multipack sampler processes (#2771 ) [skip ci] * limit to 16 packing processes * make num_processes properly reflect configured dataset_processes	2025-06-12 13:22:58 -04:00
salman	3634d8ff9d	QAT docfix (#2778 ) [skip ci] * nits * Update docs/qat.qmd Co-authored-by: NanoCode012 <nano@axolotl.ai> --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-06-12 13:22:40 -04:00
Wing Lian	bcc108efc1	build 2.7.1 images too (#2784 ) [skip ci]	2025-06-12 13:22:20 -04:00
Wing Lian	581dd324cc	build base images for torch 2.7.1 (#2764 ) * build base images for torch 2.7.1 * fix: update base docker to use torch 2.7.1 * fix: update doc for main base to use 2.7.1 * make sure to install fa2 in base uv too * use no build isolation for uv+flashattn * install psutil also for fa2 * longer timeout for flash attn build --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-06-11 17:11:06 -04:00
Dan Saunders	00cda8cc70	Data loader refactor (#2707 ) * data loading refactor (wip) * updates * progress * pytest * pytest fix * lint * zero_first -> filelock, more simplifications * small simplification * import change * nit * lint * simplify dedup * couldnt resist * review comments WIP * continued wip * minor changes * fix; remove contrived test * further refactor * set default seed in pydantic config * lint * continued simplication * lint * renaming and nits * filelock tests * fix * fix * lint * remove nullable arg * remove unnecessary code * moving dataset save fn to shared module * remove debug print * matching var naming * fn name change * coderabbit comments * naming nit * fix test	2025-06-10 19:53:07 -04:00
Dan Saunders	52a0452acb	magistral small placeholder (#2777 )	2025-06-10 13:03:41 -04:00
NanoCode012	83632f71d8	Feat: add tool calling support via tools column (#2774 ) * feat: add tool_calling field support * fix: add tests	2025-06-09 21:42:05 -07:00
Qingyang Wu	92afa4fa27	Fix the bug of position ids padding (#2739 ) [skip ci] * Update batching.py: fix the bug of position ids padding if position ids is padded with a long sequence of zeros, it will cause flash attention to crash * use alternate calculation for padding position_ids with a range --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-09 21:26:36 -07:00
Wing Lian	dd660c2ed0	handle when unable to save optimizer state when using ao optimizer with FSDP (#2773 ) [skip ci] * handle when unable to save optimizer state when using ao optimizer with FSDP1 * improve messaging Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-06-09 21:26:14 -07:00
Wing Lian	09c685fd2c	fix worker_init_fn signature handling (#2769 )	2025-06-08 23:14:10 -07:00
Wing Lian	7909bfb076	add manual seed for flaky test_geglu_backward test (#2763 ) [skip ci]	2025-06-05 09:23:17 -07:00
Wing Lian	cb03c765a1	add uv tooling for e2e gpu tests (#2750 ) * add uv tooling for e2e gpu tests * fixes from PR feedback * simplify check * fix env var * make sure to use uv for other install * use raw_dockerfile_image * Fix import * fix args to experimental dockerfile image call * use updated modal versions	2025-06-05 07:25:06 -07:00
Timofey Klyubin	4440b4a1ce	remove unused field for chat_template.default for DPO training (#2755 ) [skip ci] * remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise * lint and fix tests for new return value * remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise lint and fix tests for new return value fix for updated expected fields for dpo remove unused field for chat_template.default "messages" field present in final dataset causes issues with DPO training otherwise fix test still expecting "messages" field * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-06-05 07:22:58 -07:00
NanoCode012	e8e45b3441	fix: remove hqq (#2759 ) [skip ci]	2025-06-05 07:22:23 -07:00
Wing Lian	c67910fa6f	bump hf deps (#2735 ) [skip ci] * bump hf deps * upgrade liger-kernel too * install cce from fork for transformers fix * fix reference to vocab size in gemma3 patch * use padding_idx instead of pad_token_id * remove fixed gemma3 patch * use updated cce fork * fix local mllama cce patches w docstring * add test for multipack with trainer setup and fix trainer for trainer refactor upstream * bump modal version * guard for iterable datasetS * mllama model arch layout changed in latest transformers * fix batch sampler with drop_last * fix: address upstream vlm changes for lora * fix: update references to old lora target path * fix: remove mllama fa2 patch due to upstream fix * fix: lora kernel patch path for multimodal models * fix: removed mllama from quarto * run test for came optim on 2.6.0+ * fix fsdp2 patch and remove deprecated patch * make sure to set sequence_parallel_degree for grpo * Add SP test for GRPO * add sp to grpo config for trainer * use reward_funcs as kwarg to grpo trainer * fix the comprehension for reward funcs * reward funcs already passed in as args * init sp_group right before training * fix check for adding models to SP context * make sure to pass args to super * upgrade deepspeed * use updated trl and add reasoning flags for vllm * patch the worker --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-06-05 07:20:33 -07:00
NanoCode012	787880215b	fix(deepspeed): deepspeed config not being set for z3 (#2754 ) * fix(deepspeed): deepspeed config not being set for z3 * fix: comments	2025-06-03 14:27:09 -07:00
NanoCode012	4b1a29c694	feat(modal): update docker tag to use torch2.6 from torch2.5 (#2749 ) [skip ci]	2025-06-03 14:26:07 -07:00
NanoCode012	d7fa60662e	feat: add chat_template kwargs (#2694 ) [skip ci]	2025-06-03 14:25:26 -07:00

1 2 3 4 5 ...

2179 Commits