axolotl

Author	SHA1	Message	Date
Wing Lian	f9c7c3bb72	don't use is_main_process during config validation (#2569 )	2025-04-26 14:14:52 -04:00
Wing Lian	5dba5c82a8	fix support for wandb run_name for rl trainers (#2566 ) [skip ci] * fix support for wandb run_name for rl trainers * prefer to use wandb random names for run_name	2025-04-25 21:10:54 -04:00
Chiwan Park	e3c9d541a7	fix: crash when pretraining_dataset with dispatch_batches is false (#2558 )	2025-04-25 17:15:03 -04:00
Wing Lian	53dbf97d85	make cce default to true when using the plugin (#2562 ) [skip ci]	2025-04-25 17:14:26 -04:00
Eko Julianto Salim	2c2563bc34	fix: gradient checkpointing functools.partial object has no attribute __self__ (#2563 ) [skip ci] * fix: gradient checkpointing causing functools.partial error * lint * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-04-25 17:02:37 -04:00
Dan Saunders	ae1c7ace63	Sequence parallel training context manager (#2553 ) * ctx manager for SP * updates * update * further simplifying * accommodate both training context managers * simplifying * simplifying * nit * reorg * tweak codecov yaml * add gather post hook, simplify, fixes * pytest * pytest fix	2025-04-25 10:33:54 -04:00
Wing Lian	a4d5112ae1	builds for torch 2.7.0 (#2552 ) * builds for torch==2.7.0 * use xformers==0.0.29.post3 * no vllm support with torch 2.7 * update default, fix conditional * no xformers for 270 * no vllm on 2.7.0 for multigpu test too * remove deprecated verbose arg from scheduler * 2.7.0 tests on cpu	2025-04-24 00:39:31 -04:00
NanoCode012	a6d28d19b1	feat: add glm and glm4 multipack and cce (#2546 ) * feat: add glm and glm4 multipack * feat: add glm4 example * feat: add cce for glm	2025-04-23 10:27:51 -04:00
Wing Lian	32e335dd51	fix missing host/port for vllm (#2543 ) * fix missing host/port for vllm * set tensor parallel size so it doesn't always default to cli override	2025-04-22 10:16:48 -04:00
Wing Lian	341e95aac9	prevent rate limiting to hf when using dispatch batches (#2536 ) [skip ci]	2025-04-21 10:31:35 -04:00
Catgat	b882dfb63f	Fixed Rex Scheduler Warm Up (#2535 ) [skip ci] * Fixed Rex Scheduler Warm Up * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-04-21 10:30:55 -04:00
Chiwan Park	4ce469d32e	fix: upgrade liger to 0.5.8 and use native Gemma3 patches (#2527 ) * fix: upgrade liger to 0.5.8 and use native Gemma3 patches * fix: make lint happy * doc: update Liger Kernel FLCE support for Gemma 3	2025-04-18 09:57:40 -07:00
Wing Lian	60a8f0958d	zero val fix for beta (#2538 )	2025-04-17 17:27:19 -07:00
NanoCode012	9da730d6a4	fix(doc): cut cross entropy installation instructions broken in qmd (#2532 )	2025-04-16 15:02:51 -07:00
NanoCode012	32637fad00	fix: preprocess yielding whole dataset to each worker (#2503 ) [skip ci]	2025-04-16 15:02:35 -07:00
Dan Saunders	b8c633aa97	batch api HF adapter for ring-flash-attn; cleanup and improvements (#2520 ) * batch api HF adapter for ring-flash-attn; cleanup and improvements * update * adding all batch ring-flash-attn methods via single adapter * removing pad_to_sequence_len=False for now * fix * updating docs to include batch SP * review comments * fixes for batch API funcs, simplify * fixes * fix * updates * add batch_zigzag smoke test	2025-04-16 13:50:48 -04:00
NanoCode012	682a9cf79b	Fix: add delinearization and make qlora work with fsdp2 (#2515 ) * fixes for delinearization, and make qlora work with fsdp2 * Add back mistakenly removed lm_eval * typo [skip ci] * patch evals for torch.compile + fsdp2 * also check torch_compile w fsdp2 * lots of fixes for flex attn with llama4 * fix patch check and patch llama4 too * attempt to make the patches stick * use transformers 4.51.2 * update configs and README for llama4 * remove torch.compile for CI test * cleanup any existing singletons * set singleton cache to None instead of deleting * use importlib reload with monkeypatch * don't worry about transformers version, mark inputs with grads, fix regex * make sure embeds aren't on cpu * logging and mem improvements * vllm version and add to docker, make sure to save processor on conversion * fix ambiguous tensor bool check * fix vllm to not use v1, upgrade hf transformers * fix tests * make flex_attn_compile_kwargs configurable, since this depends on model params --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-04-15 23:31:39 -07:00
NanoCode012	271b24cccc	feat: update cce to latest (#2521 )	2025-04-15 22:17:10 -07:00
NanoCode012	e0420b3528	fix: allow merge lora on pre-quantized model (#2511 ) * fix: allow merge lora on pre-quantized model * fix: remove unused sections per comment	2025-04-09 14:01:42 -04:00
NanoCode012	f85861a0b2	fix: liger swiglu for llama4 (#2504 ) * fix: liger swiglu for llama4 * feat: add liger to deepseek v3 * fix: unpack not found * fix: spelling * fix: comment out deepseek v3 * fix: retest deepseek * fix: map glu * fix: patch model forward * chore: add temp code to save * fix: remove deepseek to move into separate PR	2025-04-09 02:53:17 -04:00
Wing Lian	0dac2ddeac	Llama4 linearized (#2502 ) * llama4 support for linearized experts * clean up fsdp2 sharding to prevent hang * add yaml config * cleanup example [skip ci]	2025-04-07 20:47:00 -04:00
NanoCode012	a6c03217f5	feat: add llama4 CCE (#2498 ) * feat: add llama4 CCE * fix: update model support list doc * feat: include llama4_text	2025-04-07 17:12:28 -04:00
Dan Saunders	59cd472504	SP cu_seqlens fix, refactor (#2495 ) * working on masking fix * refactor and fix multipack seqlens * pre-commit fix * adding smoke test * using existing packed seqlens util * log warning re: logged losses / gradient scaling per rank	2025-04-07 14:47:57 -04:00
NanoCode012	9b89591ead	Feat: Add doc on loading datasets and support for Azure/OCI (#2482 ) * fix: remove unused config * feat: add doc on dataset loading * feat: enable azure and oci remote file system * feat: add adlfs and ocifs to requirements * fix: add links between dataset formats and dataset loading * fix: remove unused condition * Revert "fix: remove unused condition" This reverts commit `5fe13be73e`.	2025-04-07 12:41:13 -04:00
NanoCode012	d25daebea9	fix: duplicate llama4 chattemplate enum (#2500 ) * fix: duplicate llama4 chattemplate enum * fix: duplicate chat_template string	2025-04-07 12:39:19 -04:00
NanoCode012	e0e5d9b1d6	feat: add llama4 multimodal (#2499 ) * feat: add llama4 multimodal * feat: add torchvision to base docker * just use latest torchvision --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-04-07 10:49:29 -04:00
Wing Lian	8bbad21bfd	llama4 support (#2493 ) * llama4 support * add xet support [skip ci] * be flexible on transformers version and skip test on version * don't use deepspeed for the fix_untrained_tokens test * reordering to trigger torch 2.6.0 tests first * slightly smaller train set * use 4.51.0 for now * remove stray print, add llama4 chat template to schema, bump peft to 0.15.1 * patches to make llama4 performant * add preliminary fp8 support	2025-04-07 10:49:15 -04:00
Wing Lian	5f4af3665d	FSDP2 support (#2469 ) * fsdp2 support * use accelerate release 1.6.0 * allow 8bit optims with fsdp2 * liger + torch compile fix * add fsdp2 e2e tests * use transformers commit with fsdp2 support * skip zero3 tests for this PR for now * fix fsdp2 config for ci * make sure both flex and flash attn work with fsdp2, skip fix untrained tokens * okay, actually use fdsp2... * more fixes to flex for fsdp2 * make sure to patch all the loaded models * additional validation for fsdp2, bump dep versions	2025-04-06 17:08:01 -04:00
Sung Ching Liu	a8f38c367c	Flex Attention + Packing with BlockMask support (#2363 )	2025-04-05 18:02:57 -04:00
Wing Lian	949471039f	fix tokenizer overrides w gemma3 (#2488 ) * fix tokenizer overrides w gemma3 * fix offline wrapping	2025-04-05 01:25:44 -04:00
NanoCode012	de451f99a5	fix: cohere cce scaling wrong tensor (#2483 )	2025-04-04 13:47:44 -04:00
Dan Saunders	e0cc4f1a87	removing deepspeed guard for LoRA Triton kernels (#2480 )	2025-04-03 14:50:56 -04:00
Wing Lian	5249e98058	add additional tf32 opt for cudnn (#2477 ) [skip ci]	2025-04-03 08:47:52 -04:00
Wing Lian	3877c5c69d	set release version 0.8.0 (#2476 ) Some checks failed ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl (vllm, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details publish pypi / Create Release (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled Details ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled Details ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled Details publish pypi / Upload release to PyPI (push) Has been cancelled Details * set release version 0.8.0 * make sure to include ring-flash-attn in docker image build	2025-04-02 09:50:56 -04:00
NanoCode012	e6cfb093d2	fix: disable SP during merge (#2470 ) [skip ci]	2025-04-02 09:35:00 -04:00
NanoCode012	7abc71dc0b	fix: gemma3 loss in forward pass (#2473 ) [skip ci] * fix: gemma3 loss in forward pass * fix: lint * fix: move patch before plugins * Update src/axolotl/monkeypatch/gemma3.py Co-authored-by: salman <salman.mohammadi@outlook.com> --------- Co-authored-by: Wing Lian <wing.lian@gmail.com> Co-authored-by: salman <salman.mohammadi@outlook.com>	2025-04-02 09:34:41 -04:00
NanoCode012	45bf634d17	feat: add support for multimodal in lora kernels (#2472 ) [skip ci] * feat: add support for multimodal in lora kernels * fix: improve multimodal checks * fix: add fallback for model config * chor: add gemma3 to docs	2025-04-02 09:33:46 -04:00
NanoCode012	80ba4b69f1	fix: pydantic warning validator not returning self (#2474 )	2025-04-02 07:40:49 -04:00
NanoCode012	9e22c4ca6a	fix: set rl=None during inference (#2463 )	2025-04-01 12:25:53 -04:00
Dan Saunders	7d0eb66b54	fixing eval for SP (#2468 )	2025-04-01 11:59:08 -04:00
Wing Lian	df119e3724	Validation for Muon optimizer with DS/FSDP (#2464 )	2025-04-01 09:39:12 -04:00
NanoCode012	f4ae8816bb	Fix: remove the numerous sequential log (#2461 ) * fix: remove sequential logs * feat(doc): add for sample pack sequentially and curriculum sampling	2025-04-01 09:20:00 -04:00
Wing Lian	e0aba74dd0	Release update 20250331 (#2460 ) [skip ci] * make torch 2.6.0 the default image * fix tests against upstream main * fix attribute access * use fixture dataset * fix dataset load * correct the fixtures + tests * more fixtures * add accidentally removed shakespeare fixture * fix conversion from unittest to pytest class * nightly main ci caches * build 12.6.3 cuda base image * override for fix from huggingface/transformers#37162 * address PR feedback	2025-04-01 08:47:50 -04:00
Wing Lian	328d598114	gemma3 packing fixes (#2449 ) * make gemma3 work with packing * multi-gpu e2e for ci * update gemma3 model namespace to use mirror * add gradient checkpointing to multigpu e2e ci * update gemma3 examples for use_reentrant and fix ddp find unused params * fix tests for gemma3 * fix import for test utils * set correct train loss for gemma3 e2e	2025-03-31 17:15:23 -04:00
DreamGenX	4d36ecc724	Sequential sample packing (#2404 ) [skip ci] * add sequential sample packing * chore: lint --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-03-31 15:48:20 -04:00
NanoCode012	7acf93b59f	Fix(doc): Clarify doc on attention configs and missing pad_token (#2455 ) [skip ci] * fix: clarify input type * fix: handling of error message if data_files not available * fix: clarify attention handling * fix: add doc on missing pad token	2025-03-31 15:47:28 -04:00
Wing Lian	b6fc46ada8	Updates for trl 0.16.0 - mostly for GRPO (#2437 ) [skip ci] * add grpo scale_rewards config for trl#3135 * options to connect to vllm server directly w grpo trl#3094 * temperature support trl#3029 * sampling/generation kwargs for grpo trl#2989 * make vllm_enable_prefix_caching a config param trl#2900 * grpo multi-step optimizeations trl#2899 * remove overrides for grpo trainer * bump trl to 0.16.0 * add cli to start vllm-serve via trl * call the python module directly * update to use vllm with 2.6.0 too now and call trl vllm serve from module * vllm 0.8.1 * use python3 * use sys.executable * remove context and wait for start * fixes to make it actually work * fixes so the grpo tests pass with new vllm paradigm * explicit host/port and check in start vllm * make sure that vllm doesn't hang by setting quiet so outouts go to dev null * also bump bnb to latest release * add option for wait from cli and nccl debugging for ci * grpo + vllm test on separate devices for now * make sure grpo + vllm tests runs single worker since pynccl comms would conflict * fix cli * remove wait and add caching for argilla dataset * refactoring configs * chore: lint * add vllm config * fixup vllm grpo args * fix one more incorrect schema/config path * fix another vlllm reference and increase timeout * make the tests run a bit faster * change mbsz back so it is correct for grpo * another change mbsz back so it is correct for grpo * fixing cli args * nits * adding docs * docs * include tensor parallel size for vllm in pydantic schema * moving start_vllm, more docs * limit output len for grpo vllm * vllm enable_prefix_caching isn't a bool cli arg * fix env ordering in tests and also use pid check when looking for vllm --------- Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>	2025-03-31 15:47:11 -04:00
Dan Saunders	b35992262e	Ray train bugfix (#2458 ) * fix nccl pg destroy warning * update * ray bugfix	2025-03-31 15:17:43 -04:00
Dan Saunders	ef6eb77cc8	destroy process group on Ctrl+C / training or eval run (#2457 ) * fix nccl pg destroy warning * update	2025-03-31 12:36:47 -04:00
Dan Saunders	5410195e0b	Sequence parallelism quick follow-ups; remove ModelCallback (#2450 ) * guard return if ring attn alrady registered * add docs link, bits in multi-gpu docs, remove save model callback (subsumed by HF trainers) * configurable heads_k_stride from ring-flash-attn hf adapter	2025-03-31 09:13:42 -04:00

1 2 3 4 5 ...

1057 Commits