NanoCode012
20fda75917
feat(doc): add google analytics to docs ( #2708 )
2025-05-28 15:51:21 +07:00
NanoCode012
6b6370f4e3
feat(doc): add info on how to use dapo / dr grpo and misc doc fixes ( #2673 ) [skip ci]
...
* feat(doc): add info on how to use dapo / dr grpo
* chore: add missing config to docs
* fix: missing comment
* fix: add missing scheduler from schema
* chore: refactor lr scheduler docs
* fix: remove log_sweep
2025-05-28 15:51:04 +07:00
mashdragon
add2025253
Fix Mistral chat template (mistral_v7_tekken) ( #2710 ) [skip ci]
...
Per 4b8dd8aae7 (d2h-482763)
2025-05-28 15:50:47 +07:00
artem
a703560a10
add two checks to handle legacy format interleaved multimodal ds ( #2721 ) [skip ci]
...
* add two checks to handle legacy format interleaved ds
* fix: add warning about multiple image using legacy format
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-05-28 15:49:43 +07:00
NOHHYEOB, BAE
4a80d309e8
Add chat templates for command-a and aya-23-8B models ( #2731 ) [skip ci]
...
* Add chat templates for command-a and aya model
* Fix: isolate for-loop update and remove unintended changes
2025-05-28 15:49:16 +07:00
NanoCode012
e33f225434
feat(doc): note lora kernel incompat with RLHF ( #2706 ) [skip ci]
...
* feat(doc): note lora kernel incompat with RLHF
* fix: add validation following comments
* chore: fix typo following suggestion
2025-05-28 15:48:40 +07:00
NanoCode012
3e6948be97
Fix(doc): clarify data loading for local datasets and splitting samples ( #2726 ) [skip ci]
...
* fix(doc): remove incorrect json dataset loading method
* fix(doc): clarify splitting only happens in completion mode
* fix: update local file loading on config doc
* fix: typo
2025-05-28 15:48:22 +07:00
github-actions[bot]
4a8af60d34
chore: update pre-commit hooks ( #2729 )
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-05-27 11:45:31 -04:00
Dan Saunders
a0941a9271
no need to generate diff file ( #2728 )
2025-05-27 11:44:06 -04:00
Dan Saunders
5eb01f3df1
Fix quarto ( #2717 )
...
* missing modules
* fix quarto complaints
2025-05-23 21:16:51 -04:00
xzuyn
d27c35ac44
Liger GraniteMoE ( #2715 )
2025-05-23 18:40:43 -04:00
Dan Saunders
a535b68043
update quarto for model loading refactor ( #2716 )
...
* update quarto for model loading refactor
* fix desc
2025-05-23 16:28:31 -04:00
Dan Saunders
b5f1e53a0f
models.py -> loaders/ module refactor ( #2680 )
...
* models.py -> loaders/ module refactor
* refactor ModelLoader class
* plugin manager changes
* circular import fix
* pytest
* pytest
* minor improvements
* fix
* minor changes
* fix test
* remove dead code
* coderabbit comments
* lint
* fix
* coderabbit suggestion I liked
* more coderabbit
* review comments, yak shaving
* lint
* updating in light of SP ctx manager changes
* review comment
* review comment 2
2025-05-23 15:51:11 -04:00
Dan Saunders
8cde256db2
Remove unused const ( #2714 )
...
* remove unused const
* accidentally commited benchmark plot
2025-05-23 12:27:38 -04:00
Dan Saunders
5f8f817200
SP context manager update ( #2699 )
...
* utilize accelerate prepare_data_loader with patching
* lint
* cleanup, fix
* update to support DPO quirk
* coderabbit commits, cleanup, remove dead code
* fix
* move ring attn patching to sp ctx manager
* lint
* lint
* test fix
* test fix
2025-05-22 11:18:32 -04:00
NanoCode012
aa0492c366
feat: do not find turn indices if turn is not trainable ( #2696 )
...
* feat: do not find turn indices if turn is not trainable
* fix: handle edge case where train on eos/eot is all
* fix: improve warning message
2025-05-22 19:19:59 +07:00
NanoCode012
798b5f5cfd
fix(RL): address plugin rl overwriting trainer_cls ( #2697 ) [skip ci]
...
* fix: plugin rl overwrite trainer_cls
* feat(test): add test to catch trainer_cls is not None
2025-05-22 19:19:12 +07:00
NanoCode012
1c83a1a020
feat(doc): clarify minimum pytorch and cuda to use blackwell ( #2704 ) [skip ci]
2025-05-22 19:18:27 +07:00
Dan Saunders
6aa41740df
SP dataloader patching + removing custom sampler / dataloader logic ( #2686 )
...
* utilize accelerate prepare_data_loader with patching
* lint
* cleanup, fix
* update to support DPO quirk
* small change
* coderabbit commits, cleanup, remove dead code
* quarto fix
* patch fix
* review comments
* moving monkeypatch up one level
* fix
2025-05-21 11:20:20 -04:00
Wing Lian
a27b909c5c
GRPO fixes (peft) ( #2676 )
...
* don't set peft_config on grpo to prevent double peft wrap
* remove overrides needed to support bug
* fix grpo tests
* require more CPU for multigpu to help with torch compile for vllm
2025-05-16 15:47:03 -04:00
xzuyn
6cb07b9d12
Fix for setting adam_beta3 and adam_epsilon2 for CAME Optimizer ( #2654 ) [skip ci]
...
* make setting `adam_beta3` and `adam_epsilon2` work correctly
* update config docs so users know args are specific to CAME optim
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-16 15:46:50 -04:00
C080
288653adb6
Fix: Make MLflow config artifact logging respect hf_mlflow_log_artifa… ( #2675 ) [skip ci]
...
* Fix: Make MLflow config artifact logging respect hf_mlflow_log_artifacts setting
* cleanup and lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-16 15:46:31 -04:00
NanoCode012
3a5b495a74
Fix: improve doc on merge/inference cli visibility ( #2674 )
...
* feat: improve visibility for merge doc
* feat: add tip on reuse config between modes
2025-05-16 13:07:40 -04:00
xzuyn
f661858fc4
Print dataset name ( #2668 ) [skip ci]
2025-05-16 13:06:58 -04:00
Eric Meier
c837c4a424
Add missing init file to liger plugin ( #2670 ) [skip ci]
2025-05-16 13:06:46 -04:00
michelyang
c9797de6bb
Add num_proc to fix data set slow processing issue ( #2681 ) [skip ci]
2025-05-16 13:06:20 -04:00
Wing Lian
8f8a7afb05
Add ci and images for CUDA 12.8 for B200s ( #2683 ) [skip ci]
...
* Add ci and images for CUDA 12.8 for B200s
* add comments explaining CI [skip e2e]
2025-05-16 13:06:08 -04:00
NanoCode012
86472715da
fix: remove doc string imports in monkeypatches ( #2671 ) [skip ci]
2025-05-16 13:05:55 -04:00
Wing Lian
c0a0c7534c
Activation checkpointing with offloading to disk with prefetch ( #2663 )
...
* offload activations to disk instead of CPU RAM
* add prefetch
* Disco :dance:
* include offload_disk in e2e test for AC
* document and make sure to cleanup
* fix annotation to match docs
* fix docs build
* address PR feedback
2025-05-13 16:39:39 -04:00
Wing Lian
7fa1089cea
Atropos support ( #2666 ) [skip ci]
...
* allow peft+liger+grpo and custom vllm serve for atropos support
* set trainer class for RL
2025-05-13 08:30:58 -04:00
Dan Saunders
80304c26a7
SP GRPO support + batch SP fixes ( #2643 )
...
* ctx manager for SP
* updates
* update
* further simplifying
* simplifying
* simplifying
* reorg
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* fix
* fixes for batch API funcs, simplify
* fix
* grpo sp support
* progress
* stronger subclassing of TRL GRPO trainer; custom distributed sampler
* subclassing constructor
* progress
* finalizing SP + GRPO trainer
* minimize diffs to GRPO trainer
* remove (most of) the custom GRPO trainer logic
* debug
* debug
* update
* update
* update
* progress
* cleanup
* cleanup
* minor changes
* update
* update
* update
* small changes
* updates
* cleanup; torch.compile ring_flash_attn functions to prevent numerical instability; lint
* spacing
* cleanup; log in pydantic model config only on main process
* remove comment
* fix sp sampler, update to latest upstream code, doc
* add docs
* update quartodoc autodoc contents
* fix, simplifications
* fixes + simplifications
* review comments
* lint
* removing main process only logs in favor of #2608
* fixes, additional smoke test
* updatse
* more tests
* update
* fix grad accum bug (sort of)
* lint, tests
* todo
2025-05-12 17:52:40 -04:00
NanoCode012
67c4ea9c7c
fix: disable auto lora kernel if dropout nonzero ( #2655 ) [skip ci]
...
* fix: disable auto lora kernel if dropout nonzero
* Add comment from PR feedback
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-12 16:23:53 -04:00
Wing Lian
526ddb886d
guard on deleting secrets from env ( #2653 ) [skip ci]
2025-05-12 14:18:42 -04:00
Wing Lian
f34eef546a
update doc and use P2P=LOC for brittle grpo test ( #2649 )
...
* update doc and skip brittle grpo test
* fix the path to run the multigpu tests
* increase timeout, use LOC instead of NVL
* typo
* use hf cache from s3 backed cloudfront
* mark grpo as flaky test dues to vllm start
2025-05-12 14:17:25 -04:00
Wing Lian
c7b6790614
Various fixes for CI, save_only_model for RL, prevent packing multiprocessing deadlocks ( #2661 )
...
* lean mistral ft tests, remove e2e torch 2.4.1 test
* make sure to pass save_only_model for RL
* more tests to make ci leaner, add cleanup to modal ci
* fix module for import in e2e tests
* use mp spawn to prevent deadlocks with packing
* make sure cleanup shell script is executable when cloned out
2025-05-12 10:51:18 -04:00
Dan Saunders
47e0e71bc8
don't sort multipack sampler ( #2657 )
...
* don't sort multipack sampler
* increased packing efficiency increases loss
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-09 20:28:58 -04:00
Wing Lian
0f3587174d
swap tinymodels that have safetensors for some ci tests ( #2641 )
2025-05-07 15:06:07 -04:00
xzuyn
25e6c5f9bd
Add CAME Optimizer ( #2385 )
2025-05-07 10:31:46 -04:00
NanoCode012
32f51bca35
fix(doc): clarify instruction to delinearize llama4 similar to cli doc ( #2644 ) [skip ci]
2025-05-07 10:29:47 -04:00
NanoCode012
9daa04da90
Fix: improve error message on failed dataset load ( #2637 ) [skip ci]
...
* fix(log): clarify error on dataset loading failed
* fix: add path for easy tracking of broken config
* fix: improve error message based on pr feedback
2025-05-07 10:29:05 -04:00
Wing Lian
0d71b0aa5f
Configurable embeddings upcast ( #2621 )
...
* fsdp embeddings should be float32 per comment
* patch peft to not upcast everything
* add tabs back to code check
* fix import
* add configurable option and fix check
* add check for dtypes
* move embeddings test to patch dir
* fix test
* fix comment and logic
2025-05-06 23:40:44 -04:00
Eric Meier
63aaccf85b
Fix cut_cross_entropy plugin install ( #2642 ) [skip ci]
2025-05-06 22:56:00 -04:00
Wing Lian
ff0fe767c8
xformers attention with packing ( #2619 )
...
* xformers attention with packing
* wire up the patch
* fix xformers + packing validation
* fix warning
* reorder the packing check
* fix fp16 / bf16 reset when using fp16 with bf16 auto
* fix seq lens calc to drop hanging sequences
* handle xformers patch for inference too
* fix batch size setter
* fix xformers inference
* add colab callback to fix inference post train
* PR feedback
2025-05-06 22:49:22 -04:00
Wing Lian
8e4158cc0b
Multipack parallel bin packing ( #2631 )
...
* improve readability of multipack sampler
* parallel bin packing
fix error with lambda and pickling
make sure things are in float instead of np.float
* annotations and comments update
* support for configurable group and bin size for sample packing
* fix missing map back to original indices
2025-05-06 20:08:08 -04:00
Wing Lian
cd84325253
allow plugins to return their own dataset ( #2617 ) [skip ci]
...
* allow plugins to return their own dataset
* add post_trainer_create and wire up
* add hook check
* address PR feedback:
* remove annotation causing circular import
2025-05-06 20:05:51 -04:00
NanoCode012
0b140fef83
feat(doc): add split_thinking docs ( #2613 ) [skip ci]
...
* feat(doc): add split_thinking docs
* fix: link config.qmd to conversation.qmd for split_thinking example
* update thinking => reasoning_content in messages format
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-06 20:05:32 -04:00
Wing Lian
e4cfebe995
bump liger dep to 0.5.9 ( #2640 ) [skip ci]
...
* bump liger dep to 0.5.9
* also upgrade vllm to post1, and datasets to 3.5.1
2025-05-06 20:05:19 -04:00
mhenrichsen
a6cac5dd32
Update lr_scheduler options in config.qmd to include additional scheduling strategies for improved training flexibility. ( #2636 ) [skip ci]
2025-05-06 11:24:07 -04:00
Wing Lian
b71c0e3447
Print axolotl art if train is called outside of cli: ( #2627 ) [skip ci]
2025-05-06 11:18:45 -04:00
Wing Lian
ddaebf8309
fix dpo eval override to call grandparent instead of the broken super ( #2628 ) [skip ci]
2025-05-06 11:18:25 -04:00