axolotl

Author	SHA1	Message	Date
NanoCode012	8b60aa00de	fix: use latest commit of PR in case rebased/pushed	2025-08-05 20:45:14 +07:00
NanoCode012	c10c738b2c	fix: remove duplicate cce install	2025-08-05 20:42:46 +07:00
NanoCode012	c20b3cc0e8	feat: add vram usage	2025-08-05 17:23:41 +07:00
NanoCode012	df91241535	fix: chat template log appearing despite tokenizer already having template	2025-08-05 17:21:12 +07:00
NanoCode012	c575be59b2	feat: update readme instructions to include CCE installation	2025-08-05 17:18:20 +07:00
NanoCode012	737315b614	feat: add hunyuan docs and example config	2025-08-05 13:34:18 +07:00
NanoCode012	409c7768b4	feat: add multipack support for granite and hunyuan	2025-08-05 13:16:39 +07:00
NanoCode012	c3c2ede467	feat: update cce docs	2025-08-05 13:16:28 +07:00
NanoCode012	95d1725849	feat: add hunyuan cce support	2025-08-05 13:13:26 +07:00
Wing Lian	ab49d16e34	Dion optimizer support (#3014 ) * Add support for Dion optimizer * dion training kwargs * fix var names * no dion 8bit for now * use updated axolotl-contribs-mit for dion optimizer * add smoke test for dion optimizer * add docs * fix typo during edits * fix test to not remove load in 8bit	2025-08-04 16:33:30 -04:00
Carsten Kragelund Jørgensen	33d094721c	fix: deepcopy lr in RexLR scheduler. (#3012 ) * fix: deepcopy lr in RexLR scheduler. This fixes a problem where when the lr is a scalar tensor, the base_lrs in the get_lr function end up being references to the current learning rate, rather than the correct initial learning rate. See also related pytorch PR https://github.com/pytorch/pytorch/pull/127190/ * fix: add missing torch.Tensor import	2025-08-04 10:23:49 -04:00
NanoCode012	a54c1be972	Fix: shorten mem logs to 2 decimal places and renamed nd docs (#3011 ) [skip ci] * fix: shorten memory logs * fix: title name	2025-08-04 10:23:36 -04:00
github-actions[bot]	5691992d34	chore: update pre-commit hooks (#3009 ) [skip ci] Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com>	2025-08-04 10:23:19 -04:00
Dan Saunders	e758343cac	FSDP2 + LoRA kernels (#2992 ) * impl fix * smoke tests * patches for fsdp2 + qlora compat * nit * working fix * working fix * fix merge * minifying patches; update bnb dep * renaming; adding tests * remove duplicate test, add dora guard * generalize __torch_function__ * revert generalization * update comments	2025-08-03 20:05:17 -04:00
Wing Lian	deac7b18a1	upgrade peft v0.17.0 and support for lora target_parameters (#3006 )	2025-08-02 20:24:04 -04:00
Wing Lian	10946afae7	fixes for spinning up vllm service for grpo (#3001 )	2025-08-02 11:19:24 -04:00
Wing Lian	5639552064	prevent usage of low bit ao optimizers with configurations that use parameter groups (#3003 ) * prevent usage of low bit ao optimizers with configurations that use parameter groups * use optimizer enum value * fix validation	2025-08-01 17:54:04 -04:00
Wing Lian	cda3c82351	move ib/rdma libs into base image (#3002 ) * move ib/rdma libs into base image * use --no-install-recommends	2025-08-01 16:10:37 -04:00
Wing Lian	7c3b428f23	Add validation for TP with models with tied embeddings (#2999 ) * add validation for tp + tied embeddings models * fix logic and messaging * add additional guard for null tp size	2025-08-01 13:58:16 -04:00
Wing Lian	01a6bd1a0e	use CCE fix for TP using vocab parallel for CEL (#3000 )	2025-08-01 13:21:58 -04:00
NanoCode012	41709822a7	fix: move memory usage log to trainer.log (#2996 ) [skip ci]	2025-08-01 13:21:43 -04:00
Wing Lian	02a37199ee	prevent empty value for vllm_mode (#2998 )	2025-08-01 09:59:45 -04:00
NanoCode012	7026cd5e9e	Feat: Add N-D parallelism docs (#2989 ) * fix: remove non-existent file * feat: add n-d parallel docs * fix: comments --------- Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-08-01 13:18:31 +07:00
NanoCode012	eb0a8a7775	feat: upgrade cce commit to include smollm3, granite, granitemoe (#2993 )	2025-07-31 18:18:44 -04:00
salman	294c7fe7a6	Distributed/ND-Parallel (#2977 )	2025-07-31 15:25:02 -04:00
Wing Lian	7b68dfafd7	jagged lr restart scheudler (#1680 ) [skip ci] * jagged lr restart scheudler var name fix make sure to create scheduler first * wire things together * more fixes * fix for nesting scheduler and first anneal phase * no need for relora trainer anymore since we've generalized the relora scheduler * remove redundant relora scheduler and lint * update relora e2e test for updated params * need restart steps for relora test * update quarto docs for dropped relora trainer * update example yaml * drop verbose arg * min lr scale support for jagged lr * don't let min_lr be nonetype * cleanup args	2025-07-31 13:50:03 -04:00
salman	32a7890231	Revert test update to index.qmd (#2995 ) [skip ci]	2025-07-31 11:46:31 -04:00
Wing Lian	563f5eed7a	update dependencies - liger + trl (#2987 ) * update dependencies * set dataset processes for tests * add support for GSPO	2025-07-31 11:17:17 -04:00
Wing Lian	6ec282094d	actually call the register method on plugins (#2991 ) [skip ci]	2025-07-31 11:13:15 -04:00
salman	09dda462ab	Fix don't preview docs for contributors (#2994 ) [skip ci] * checking against fork vs. main repo * force doc preview	2025-07-31 11:12:41 -04:00
Dan Saunders	bb1cae1a20	CLI: add --launcher option, support launcher args, cleanup, refactor (#2924 ) * add --launcher option; explicit True/False bool args; small cleanup * refactor * add torchrun, accelerate cli args * add rdzv arg default + tests * update _quarto * coderabbit * fix * we can't set rdvz_id independently across nodes * coderabbit * fix tests	2025-07-30 15:46:56 -04:00
Wing Lian	22810c97b7	use warmup_ratio as a better default than warmup steps since it's data dependent (#2897 ) [skip ci] * use warmup_ratio as a better default than warmup steps since it's data dependent * replace remainder of warmup_steps	2025-07-30 06:44:06 -04:00
Vincenzo di Cicco	2eb7ff95af	Use '<\|finetune_right_pad\|>' as padding token for LLama4 (#2988 ) [skip ci]	2025-07-30 06:38:13 -04:00
NanoCode012	90e5598930	Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes (#2979 ) * fix: lock version in gemma3n docs * feat: add sample configs and docs * chore: move mistraltokenizer into mistral folder * feat: update instructions * feat: add dynamic load voxtral * fix: remove incorrect vision config, add audio * fix: support voxtral processing strategy and address none in data * feat: patch mistraltokenizer subclass upstream and add missing * feat: update cce commit to include voxtral * fix: remove old comment * fix: gemma3 patch not needed anymore * fix: voxtral modeling code * fix: remove incorrect ds path * fix: adjust apply chat template parsing * feat: enable voxtral patch * fix: patch * feat: update example datasets * fix: target layer * feat: update gemma3n docs * feat: update voxtral docs * feat: revert assistant parsing to rely on new upstream changes * chore: skip test till next PR fix * fix: override upstream decode due to missing handling * feat: update readme * fix: update * feat: add magistral small think support * feat: update mistral-common dep * fix: lint * fix: remove optional dep * chore: typing * chore: simply import * feat(doc): update differences for 2507 * fix: coderrabbit comments * feat: update clarify docs on new transformers	2025-07-30 15:57:05 +07:00
Wing Lian	1d2aa1e467	upgrade to support latest transformers release (#2984 ) * upgrade to support latest transformers release * bump mistral common too * Fix dependencies	2025-07-27 17:05:12 -04:00
NICOLAS BZRD	430be216d8	add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging (#2981 ) [skip ci]	2025-07-27 17:04:56 -04:00
Wing Lian	28804b82e4	don't create a reference model if grpo beta is 0.0 (#2983 ) [skip ci]	2025-07-27 17:04:42 -04:00
Wing Lian	add3e5076b	don't publish to netlify on contributor submissions since it requires auth tokens (#2985 ) [skip ci] * don't publish to netlify on contributor submissions since it requires auth tokens * fix no-tmux build and add contact to motd	2025-07-27 17:04:27 -04:00
NanoCode012	41434f0c28	feat(doc): add all providers to readme (#2972 ) [skip ci] * feat(doc): add vastai link * feat: add cloud providers to readme for more visibility * add prime intellect, remove Modal as sponsor --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-27 17:03:50 -04:00
Wing Lian	f7ea140838	TiledMLP support for FSDP2 (#2950 ) * make TiledMLP work with FSDP * cleanup/gc at start of train to prevent large VRAM spike * chore: lint * generic function for non-deepspeed training * unify patch to fix imports * update readme for ALST and add examples * make deepspeed attribute on params check more robust * update with new info from PR review	2025-07-25 07:15:03 -04:00
Wing Lian	460e0f9ed9	improve handling of file lock when content is empty (#2959 )	2025-07-24 16:10:38 -04:00
Wing Lian	e80faea0db	garbage collect on the end of the step if we're going to save a checkpoint (#2971 ) [skip ci]	2025-07-24 16:10:23 -04:00
Wing Lian	0ff2f172ef	Act offload lora fix (#2928 ) [skip ci] * fix activation offloading with lora * update w e2e test * add docs for error	2025-07-24 16:10:04 -04:00
salman	1407aac779	Skip CI for draft PRs (#2970 )	2025-07-24 09:11:46 +01:00
Dan Saunders	b34c3371ed	upgrade torchao (#2968 )	2025-07-23 10:27:28 -04:00
Wing Lian	5f1a4306b0	don't check dataset labels during preprocess for GRPO (#2952 ) [skip ci] * don't check dataset labels during preprocess for GRPO * use enum check per PR feedback	2025-07-22 20:40:44 -04:00
Wing Lian	93709eb5ce	handle refactor upstream for flash attention (#2966 )	2025-07-22 20:40:04 -04:00
Dan Saunders	208fb7b8e7	basic torchao fp8 mixed precision training (#2926 ) * debug * debug * debug * revert unneeded change * add accelerator config to base trainer builder * add back accumulated_cache_size_limit setting * lint * accelerator constructor patch for single-GPU torch fp8 * lint * re-using existing fp8 code * lint * remove accelerate patch now fix in latest release * fix * docs * add fp8 + fsdp2 example * remove unused config * update config * smoke tests * add validator * add 2.7.0 guard for fsdp2 * fix * add config descriptions * add FSDP doc link * nit * set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather * better cfg for smoke tests * add test for accelerate patching * update fp8 validator	2025-07-22 16:27:47 -04:00
Wing Lian	b86a1d47b0	we don't need to call check_dataset_labels when skip_prepare_dataset is set (#2962 ) * we don't need to call check_dataset_labels when skip_prepare_dataset is set * Fix actual bug and revert prior fix * warn and early return instead of raising an error * use error	2025-07-22 10:00:53 -04:00
NanoCode012	01d8175d48	fix: revert changing default optimizer to muon (#2965 ) [skip ci]	2025-07-22 10:00:30 -04:00

1 2 3 4 5 ...

2328 Commits