axolotl

Author	SHA1	Message	Date
NanoCode012	39fbd3b2b5	fix: lora kernels for mistral3 (#3027 ) [skip ci]	2025-08-07 09:25:37 -04:00
salman	46dfacf255	ND Parallel Doc Nits (#3032 )	2025-08-07 10:34:26 +01:00
Wing Lian	4bce713b39	allow custom trainer_cls to be defined as a module reference in the YAML (#3024 ) [skip ci] * allow custom trainer_cls to be defined as a module reference in the YAML * address PR feedback and add test * add tests	2025-08-06 22:49:19 -04:00
Dan Saunders	d09290f2f4	Lora kernels bias support (#3025 ) * lora kernels bias support * revert rename * nit * lint, tests * satisfying the rabbit	2025-08-06 20:20:08 -04:00
Wing Lian	e442ff22aa	fix keyerror on load_in_8bit/load_in_4bit access in _set_quantization_config (#3023 ) * set load_in_8bit/load_in_4bit in _set_quantization_config to prevent keyerror * use dict.get instead	2025-08-06 14:28:52 -04:00
Wing Lian	ba3dba3e4f	add kernels for gpt oss models (#3020 ) * add kernels for gpt oss models * add support for gpt-oss * typo incorrect package * fix: layout for configs and added wandb/epochs * add gptoss example w offload and set moe leaf for z3 * add support for Mxfp4Config from yaml * update yaml to use official model * fix lora and don't allow triton to go above 3.3.1 * fix lr and tweak vram use * fix range for triton since pinned wasn't compatible with toch 2.6.0 * update cce with gpt oss patches --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-06 09:47:55 -04:00
Wing Lian	97e86c6d47	drop old patches and code that are no longer needed (#3007 ) [skip ci]	2025-08-06 08:02:39 -04:00
VED	784f8c0e95	fix:kd_distillation key_error logprobs (#2990 ) * fix:kd_distillation key_error logprobs * style * fix: leave handling of pop logprobs to parent --------- Co-authored-by: NanoCode012 <nano@axolotl.ai>	2025-08-06 08:02:07 -04:00
NanoCode012	e3177c3210	feat: add complete optimizer docs (#3017 ) [skip ci] * feat: add complete optimizer docs * fix: deprecate old torchao adamw low bit	2025-08-06 08:01:51 -04:00
Wing Lian	70faea331f	add support for connecting via prime-intellect (#3021 )	2025-08-06 01:06:52 -04:00
Wing Lian	8021c718ce	use skip_move_to_device for all cases (#3015 ) * use skip_move_to_device for all cases * use experimental option for skip move	2025-08-06 00:13:12 -04:00
Wing Lian	42f5e6f9e9	upgrade transformers==4.55.0 (#3018 )	2025-08-05 16:29:12 -04:00
Wing Lian	ab49d16e34	Dion optimizer support (#3014 ) * Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit	2025-08-04 16:33:30 -04:00
Carsten Kragelund Jørgensen	33d094721c	fix: deepcopy lr in RexLR scheduler. (#3012 ) * fix: deepcopy lr in RexLR scheduler. This fixes a problem where when the lr is a scalar tensor, the base_lrs in the get_lr function end up being references to the current learning rate, rather than the correct initial learning rate. See also related pytorch PR https://github.com/pytorch/pytorch/pull/127190/ * fix: add missing torch.Tensor import	2025-08-04 10:23:49 -04:00
NanoCode012	a54c1be972	Fix: shorten mem logs to 2 decimal places and renamed nd docs (#3011 ) [skip ci] * fix: shorten memory logs * fix: title name	2025-08-04 10:23:36 -04:00
github-actions[bot]	5691992d34	chore: update pre-commit hooks (#3009 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-08-04 10:23:19 -04:00
Dan Saunders	e758343cac	FSDP2 + LoRA kernels (#2992 ) * impl fix * smoke tests * patches for fsdp2 + qlora compat * nit * working fix * working fix * fix merge * minifying patches; update bnb dep * renaming; adding tests * remove duplicate test, add dora guard * generalize __torch_function__ * revert generalization * update comments	2025-08-03 20:05:17 -04:00
Wing Lian	deac7b18a1	upgrade peft v0.17.0 and support for lora target_parameters (#3006 )	2025-08-02 20:24:04 -04:00
Wing Lian	10946afae7	fixes for spinning up vllm service for grpo (#3001 )	2025-08-02 11:19:24 -04:00
Wing Lian	5639552064	prevent usage of low bit ao optimizers with configurations that use parameter groups (#3003 ) * prevent usage of low bit ao optimizers with configurations that use parameter groups * use optimizer enum value * fix validation	2025-08-01 17:54:04 -04:00
Wing Lian	cda3c82351	move ib/rdma libs into base image (#3002 ) * move ib/rdma libs into base image * use --no-install-recommends	2025-08-01 16:10:37 -04:00
Wing Lian	7c3b428f23	Add validation for TP with models with tied embeddings (#2999 ) * add validation for tp + tied embeddings models * fix logic and messaging * add additional guard for null tp size	2025-08-01 13:58:16 -04:00
Wing Lian	01a6bd1a0e	use CCE fix for TP using vocab parallel for CEL (#3000 )	2025-08-01 13:21:58 -04:00
NanoCode012	41709822a7	fix: move memory usage log to trainer.log (#2996 ) [skip ci]	2025-08-01 13:21:43 -04:00
Wing Lian	02a37199ee	prevent empty value for vllm_mode (#2998 )	2025-08-01 09:59:45 -04:00
NanoCode012	7026cd5e9e	Feat: Add N-D parallelism docs (#2989 ) * fix: remove non-existent file * feat: add n-d parallel docs * fix: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-01 13:18:31 +07:00
NanoCode012	eb0a8a7775	feat: upgrade cce commit to include smollm3, granite, granitemoe (#2993 )	2025-07-31 18:18:44 -04:00
salman	294c7fe7a6	Distributed/ND-Parallel (#2977 )	2025-07-31 15:25:02 -04:00
Wing Lian	7b68dfafd7	jagged lr restart scheudler (#1680 ) [skip ci] * jagged lr restart scheudler var name fix make sure to create scheduler first * wire things together * more fixes * fix for nesting scheduler and first anneal phase * no need for relora trainer anymore since we've generalized the relora scheduler * remove redundant relora scheduler and lint * update relora e2e test for updated params * need restart steps for relora test * update quarto docs for dropped relora trainer * update example yaml * drop verbose arg * min lr scale support for jagged lr * don't let min_lr be nonetype * cleanup args	2025-07-31 13:50:03 -04:00
salman	32a7890231	Revert test update to index.qmd (#2995 ) [skip ci]	2025-07-31 11:46:31 -04:00
Wing Lian	563f5eed7a	update dependencies - liger + trl (#2987 ) * update dependencies * set dataset processes for tests * add support for GSPO	2025-07-31 11:17:17 -04:00
Wing Lian	6ec282094d	actually call the register method on plugins (#2991 ) [skip ci]	2025-07-31 11:13:15 -04:00
salman	09dda462ab	Fix don't preview docs for contributors (#2994 ) [skip ci] * checking against fork vs. main repo * force doc preview	2025-07-31 11:12:41 -04:00
Dan Saunders	bb1cae1a20	CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 ) * add --launcher option; explicit True/False bool args; small cleanup * refactor * add torchrun, accelerate cli args * add rdzv arg default + tests * update _quarto * coderabbit * fix * we can't set rdvz_id independently across nodes * coderabbit * fix tests	2025-07-30 15:46:56 -04:00
Wing Lian	22810c97b7	use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci] * use warmup_ratio as a better default than warmup steps since it's data dependent * replace remainder of warmup_steps	2025-07-30 06:44:06 -04:00
Vincenzo di Cicco	2eb7ff95af	Use '<\|finetune_right_pad\|>' as padding token for LLama4 (#2988 ) [skip ci]	2025-07-30 06:38:13 -04:00
NanoCode012	90e5598930	Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes (#2979 ) * fix: lock version in gemma3n docs * feat: add sample configs and docs * chore: move mistraltokenizer into mistral folder * feat: update instructions * feat: add dynamic load voxtral * fix: remove incorrect vision config, add audio * fix: support voxtral processing strategy and address none in data * feat: patch mistraltokenizer subclass upstream and add missing * feat: update cce commit to include voxtral * fix: remove old comment * fix: gemma3 patch not needed anymore * fix: voxtral modeling code * fix: remove incorrect ds path * fix: adjust apply chat template parsing * feat: enable voxtral patch * fix: patch * feat: update example datasets * fix: target layer * feat: update gemma3n docs * feat: update voxtral docs * feat: revert assistant parsing to rely on new upstream changes * chore: skip test till next PR fix * fix: override upstream decode due to missing handling * feat: update readme * fix: update * feat: add magistral small think support * feat: update mistral-common dep * fix: lint * fix: remove optional dep * chore: typing * chore: simply import * feat(doc): update differences for 2507 * fix: coderrabbit comments * feat: update clarify docs on new transformers	2025-07-30 15:57:05 +07:00
Wing Lian	1d2aa1e467	upgrade to support latest transformers release (#2984 ) * upgrade to support latest transformers release * bump mistral common too * Fix dependencies	2025-07-27 17:05:12 -04:00
NICOLAS BZRD	430be216d8	add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging (#2981 ) [skip ci]	2025-07-27 17:04:56 -04:00
Wing Lian	28804b82e4	don't create a reference model if grpo beta is 0.0 (#2983 ) [skip ci]	2025-07-27 17:04:42 -04:00
Wing Lian	add3e5076b	don't publish to netlify on contributor submissions since it requires auth tokens (#2985 ) [skip ci] * don't publish to netlify on contributor submissions since it requires auth tokens * fix no-tmux build and add contact to motd	2025-07-27 17:04:27 -04:00
NanoCode012	41434f0c28	feat(doc): add all providers to readme (#2972 ) [skip ci] * feat(doc): add vastai link * feat: add cloud providers to readme for more visibility * add prime intellect, remove Modal as sponsor --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-27 17:03:50 -04:00
Wing Lian	f7ea140838	TiledMLP support for FSDP2 (#2950 ) * make TiledMLP work with FSDP * cleanup/gc at start of train to prevent large VRAM spike * chore: lint * generic function for non-deepspeed training * unify patch to fix imports * update readme for ALST and add examples * make deepspeed attribute on params check more robust * update with new info from PR review	2025-07-25 07:15:03 -04:00
Wing Lian	460e0f9ed9	improve handling of file lock when content is empty (#2959 )	2025-07-24 16:10:38 -04:00
Wing Lian	e80faea0db	garbage collect on the end of the step if we're going to save a checkpoint (#2971 ) [skip ci]	2025-07-24 16:10:23 -04:00
Wing Lian	0ff2f172ef	Act offload lora fix (#2928 ) [skip ci] * fix activation offloading with lora * update w e2e test * add docs for error	2025-07-24 16:10:04 -04:00
salman	1407aac779	Skip CI for draft PRs (#2970 )	2025-07-24 09:11:46 +01:00
Dan Saunders	b34c3371ed	upgrade torchao (#2968 )	2025-07-23 10:27:28 -04:00
Wing Lian	5f1a4306b0	don't check dataset labels during preprocess for GRPO (#2952 ) [skip ci] * don't check dataset labels during preprocess for GRPO * use enum check per PR feedback	2025-07-22 20:40:44 -04:00
Wing Lian	93709eb5ce	handle refactor upstream for flash attention (#2966 )	2025-07-22 20:40:04 -04:00

1 2 3 4 5 ...

2331 Commits