axolotl

Author	SHA1	Message	Date
NanoCode012	1d54518990	fix: remove unnecessary gc and empty cache	2026-02-26 19:18:29 +07:00
NanoCode012	e0eed7542d	fix: fsdp2 init_sharded_param load int8/uint4 dtensor as require_grad=true on init	2026-02-26 18:21:32 +07:00
NanoCode012	5f6fcd1f7e	feat: upgrade cce	2026-02-26 16:41:10 +07:00
NanoCode012	21b2dfef2d	feat: attempt to release bf16 experts from vram	2026-02-26 16:19:07 +07:00
NanoCode012	f68d9f839d	feat: attempt to quant experts in 8bit mode too	2026-02-26 16:11:54 +07:00
NanoCode012	88a48eff8a	fix: handle fsdp2 for paramwrapper dtensor	2026-02-26 15:53:59 +07:00
NanoCode012	c329b43fdd	fix(doc): clarify support	2026-02-26 13:52:00 +07:00
NanoCode012	c58eaaae51	chore: add extra empty cache	2026-02-25 18:17:11 +07:00
NanoCode012	731d5dd193	chore: remove leftover logs	2026-02-25 18:05:15 +07:00
NanoCode012	6ad4b4ecbe	fix: remove cuda alloc for moe and enable async load	2026-02-25 18:01:58 +07:00
NanoCode012	ca822cd24c	chore: adjust log	2026-02-25 17:49:57 +07:00
NanoCode012	4b2f568ee0	chore: add log	2026-02-25 17:41:14 +07:00
NanoCode012	1558436c69	fix: attempt disable async load	2026-02-25 17:30:32 +07:00
NanoCode012	d3d6cb6b67	fix: attempt on-load quantize experts instead of post-load	2026-02-25 17:20:40 +07:00
NanoCode012	593599a217	fix: clear cache per param quant	2026-02-25 16:38:42 +07:00
NanoCode012	ad4e1a5a91	fix: try match target param properly end with	2026-02-25 16:17:28 +07:00
NanoCode012	2fc60b9021	feat: add moe quant to test by ved	2026-02-25 15:41:41 +07:00
NanoCode012	91dae42737	fix: re-add patch from transformers PR #39866	2026-02-25 15:04:28 +07:00
NanoCode012	4657cb7177	fix: add dropout check when using lora target param	2026-02-25 14:51:42 +07:00
NanoCode012	eb13054672	fix: apply fix for only CP mode	2026-02-25 14:49:46 +07:00
NanoCode012	0d0122cabe	fix: saving clones state dict	2026-02-25 14:27:51 +07:00
Robert Ronan	2b6f4a6c9b	Fix: excess_length_strategy truncation method (#3401 ) * Add test cases to verify that the problem exists in the underlying * Update the handle_long_sequences function to correctly use Map instead of filter for the truncation strategy. Also remove the minimal length filtering from the truncate_long_samples function, and run it separately and before. * fix: refactor and add test truncate for non-input id fields * fix: refactor long seq handling fn * fix: refactor duplicate fn and simplify route * add additional tests and make them work on mac * handle logging exception on empty datasets --------- Co-authored-by: 2ndset bot <bot@2ndset.ai> Co-authored-by: NanoCode012 <nano@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2026-02-25 11:31:11 +07:00
madScientist10	8f54b4eb25	fix: pass revision parameter to tokenizer and processor loaders (#3388 ) [skip ci] * fix: pass revision parameter to tokenizer and processor loaders * fix: address revision=None passed to .from_pretrained * add tests and address review feedback for revision parameter - Reformat modify_tokenizer_files signature and from_pretrained call - Use kwargs pattern for modify_tokenizer_files call to avoid passing None revision - Add 6 unit tests for revision parameter in tokenizer/processor loaders --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-02-25 11:11:20 +07:00
VED	a131e4d0e5	sample gen support sft (#3240 ) [skip ci] * add:parameters + callback * sft core + logging * indentation fix * logger fix * loger fix in sft * gen sample on eval * lint * deprecation	2026-02-25 11:10:57 +07:00
Wing Lian	1791d87b6f	build axolotl images with torch 2.10.0 (#3430 )	2026-02-24 22:35:25 -05:00
Wing Lian	b40803da51	build base images for torch 2.10.0 (#3429 )	2026-02-24 20:32:34 -05:00
Wing Lian	68f1b7004c	ScatterMoE LoRA support (#3410 ) * scattermoe lora support * fsdp, bf16, dim fixes * expert weights aren't needed in save for bwd since they are frozen * use sonicmoe optim options * update save model from upstream * fixes per code review feedback and add tests * revert removal of CP fix * misc fixes	2026-02-24 14:59:55 -05:00
NanoCode012	08441fed17	fix: set allowed values for `adapter` config (#3415 )	2026-02-23 11:39:53 -05:00
NanoCode012	86ca1e27c0	fix: update MistralProcessor to be v5 compat (#3423 ) * fix: update MistralProcessor to be v5 compat * feat: add test for mistral3 processor * chore: comment	2026-02-23 11:39:13 -05:00
Manas Vardhan	5ed455715e	feat: support dot-notation CLI args for nested config options (#3419 ) * feat: support dot-notation CLI args for nested config options Add support for overriding nested config fields (like TRL config) via CLI using dot-notation, e.g.: axolotl train grpo.yaml --trl.vllm-server-host=10.0.0.1 --trl.beta=0.1 Changes: - args.py: Detect BaseModel subclass fields and generate dot-notation CLI options (--parent.child) that map to double-underscore kwargs (parent__child). Also fix _strip_optional_type for Python 3.10+ union syntax (X \| None). - config.py: Handle double-underscore kwargs in load_cfg by setting nested dict values on the config. - Add tests for nested option handling. Fixes #2702 * Address CodeRabbit review: fix string parent bug, add type hints and docstring Signed-off-by: Manas Vardhan <manasvardhan@gmail.com> * Add type coercion for CLI kwargs and fix pre-commit issues - Add _coerce_value() for YAML-style type inference on string CLI args - When existing config value has a type (int/float/bool), cast to match - When no existing value, infer type from string (true/false, ints, floats, null) - Apply coercion to both flat and nested (dot-notation) kwargs - Fix unused pytest import (pre-commit/ruff) - Update tests to pass string values (matching real CLI behavior) - Add dedicated TestCoerceValue test class Addresses maintainer feedback on type casting for nested kwargs. --------- Signed-off-by: Manas Vardhan <manasvardhan@gmail.com>	2026-02-23 10:10:06 -05:00
Lorenzo Baraldi	3f30572d4a	Fix typo in dataset_processes field (#3426 ) * Fix typo in dataset_processes field * fix: use updated config name --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2026-02-23 14:18:37 +07:00
NanoCode012	43d60c7439	bump cut-cross-entropy to 58d6572 (#3424 )	2026-02-20 14:24:51 -05:00
Wing Lian	0ea252d392	update to trackio 0.16.1 (#3425 ) [skip ci]	2026-02-20 14:24:33 -05:00
Wing Lian	29722dec60	use bunnycdn for CI assets (#3422 ) [skip ci]	2026-02-20 00:09:25 -05:00
NanoCode012	7fbedbd300	fix(doc): add limitation for unfrozen_parameters (#3416 )	2026-02-19 18:32:26 -05:00
Wing Lian	145ffc9be1	upgrade transformers to 5.2.0 and torchao to 0.16.0 (#3407 ) * upgrade transformers to 5.1.0 and torchao to 0.16.0 * upgrade trl for parity * handle trl api changes * orpo doesn't have max_prompt_len to check anymore * cpoconfig doesn't take max_prompt_length and fix cpu offload * slow fsdp1 test * triton min 3.4.0 and liger to 0.7.0 * use transformers main for now for zero3 fix * handle group_by_length change * fix changes upstream * mark skip flaky test * use transformers latest release 5.2.0	2026-02-19 18:27:27 -05:00
NanoCode012	4f1b5ad29f	fix: clarify how to use lm_eval plugin (#3404 ) [skip ci]	2026-02-15 07:52:30 -05:00
NanoCode012	d6a2532dd7	feat(doc): clarify how to use scattermoe (#3408 ) [skip ci] * feat(doc): clarify how to use scattermoe * chore: fix wording	2026-02-15 07:51:28 -05:00
Wing Lian	5eb265513c	fix generic patch for cce (#3405 )	2026-02-12 08:58:04 -05:00
NanoCode012	06ac407b92	feat: improve telemetry log (#3398 ) * fix: redact trackio and data_files * fix: add new orgs to whitelist * feat: add run id to logs for users to easily share * fix: update to add more metrics * fix: add missed experiment tracker * chore: formatting in main	2026-02-10 23:01:34 +07:00
NanoCode012	4e22cf0651	fix: remove telemetry warning (#3397 ) [skip ci]	2026-02-10 23:01:16 +07:00
VED	a4ee56c315	fix: set rollout in GRPO training_kwargs (#3392 )	2026-02-10 18:06:15 +07:00
NanoCode012	c67cbcb0f5	fix: ignore add_special_tokens and use test mode for generation for mistral tokenizer (#3396 ) [skip ci] * fix: ignore add_special_tokens and use test mode for generation * fix: incorrectly setting kwarg	2026-02-10 18:03:26 +07:00
NanoCode012	a2da852576	fix: improve lora kernels failure message and handle trust_remote_code (#3378 ) [skip ci] * fix: improve lora kernels failure message and handle trust_remote_code * chore: re-order model guides	2026-02-10 17:58:40 +07:00
madScientist10	37e9da7a53	add hub_revision support for specifying branch when pushing checkpoints (#3387 ) [skip ci]	2026-02-10 17:53:09 +07:00
NanoCode012	ed7105dba7	fix: GRPO config not accept max_prompt_length (#3390 ) [skip ci]	2026-02-10 17:52:09 +07:00
NanoCode012	b6d3653f74	feat: add step3p5 for cce (#3384 ) [skip ci] * feat: add step3p5 for cce * chore: reorder model	2026-02-10 17:51:43 +07:00
NanoCode012	fcc4cfdb63	feat: add sageattention (#2823 ) [skip ci] * feat: add sageattention * feat: call path on pre model load * fix: patch to use register to correct var * fix: add strict check import at start * chore: fix comments * chore: refactor * feat: add capability check * fix: missed underscore * fix: let sageattention use FA backend in transformers * feat: update sage attention for attention mask and position ids * feat: allow sample packing but add warning without packing * fix: loss hitting 0 with packing and attention mask note * feat: downcast embeds if sage attention too * feat: add config validation * feat: add attention docs * chore: docs	2026-02-10 17:49:21 +07:00
VED	97a4f28511	fix: saving state dict and eval for Context Parallel (#3382 ) [skip ci] * clone state_dict if none * patch calculating eval loss for cp	2026-02-10 17:47:26 +07:00
VED	86a5803212	train_per_sec_per_gpu metric (#3364 ) [skip ci] * fix token count * guard for none n zero	2026-02-10 17:44:55 +07:00

1 2 3 4 5 ...

2600 Commits