NanoCode012
39fbd3b2b5
fix: lora kernels for mistral3 ( #3027 ) [skip ci]
2025-08-07 09:25:37 -04:00
salman
46dfacf255
ND Parallel Doc Nits ( #3032 )
2025-08-07 10:34:26 +01:00
Wing Lian
4bce713b39
allow custom trainer_cls to be defined as a module reference in the YAML ( #3024 ) [skip ci]
...
* allow custom trainer_cls to be defined as a module reference in the YAML
* address PR feedback and add test
* add tests
2025-08-06 22:49:19 -04:00
Dan Saunders
d09290f2f4
Lora kernels bias support ( #3025 )
...
* lora kernels bias support
* revert rename
* nit
* lint, tests
* satisfying the rabbit
2025-08-06 20:20:08 -04:00
Wing Lian
e442ff22aa
fix keyerror on load_in_8bit/load_in_4bit access in _set_quantization_config ( #3023 )
...
* set load_in_8bit/load_in_4bit in _set_quantization_config to prevent keyerror
* use dict.get instead
2025-08-06 14:28:52 -04:00
Wing Lian
ba3dba3e4f
add kernels for gpt oss models ( #3020 )
...
* add kernels for gpt oss models
* add support for gpt-oss
* typo incorrect package
* fix: layout for configs and added wandb/epochs
* add gptoss example w offload and set moe leaf for z3
* add support for Mxfp4Config from yaml
* update yaml to use official model
* fix lora and don't allow triton to go above 3.3.1
* fix lr and tweak vram use
* fix range for triton since pinned wasn't compatible with toch 2.6.0
* update cce with gpt oss patches
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-06 09:47:55 -04:00
Wing Lian
97e86c6d47
drop old patches and code that are no longer needed ( #3007 ) [skip ci]
2025-08-06 08:02:39 -04:00
VED
784f8c0e95
fix:kd_distillation key_error logprobs ( #2990 )
...
* fix:kd_distillation key_error logprobs
* style
* fix: leave handling of pop logprobs to parent
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-06 08:02:07 -04:00
NanoCode012
e3177c3210
feat: add complete optimizer docs ( #3017 ) [skip ci]
...
* feat: add complete optimizer docs
* fix: deprecate old torchao adamw low bit
2025-08-06 08:01:51 -04:00
Wing Lian
70faea331f
add support for connecting via prime-intellect ( #3021 )
2025-08-06 01:06:52 -04:00
Wing Lian
8021c718ce
use skip_move_to_device for all cases ( #3015 )
...
* use skip_move_to_device for all cases
* use experimental option for skip move
2025-08-06 00:13:12 -04:00
Wing Lian
42f5e6f9e9
upgrade transformers==4.55.0 ( #3018 )
2025-08-05 16:29:12 -04:00
Wing Lian
ab49d16e34
Dion optimizer support ( #3014 )
...
* Add support for Dion optimizer
* dion training kwargs
* fix var names
* no dion 8bit for now
* use updated axolotl-contribs-mit for dion optimizer
* add smoke test for dion optimizer
* add docs
* fix typo during edits
* fix test to not remove load in 8bit
2025-08-04 16:33:30 -04:00
Carsten Kragelund Jørgensen
33d094721c
fix: deepcopy lr in RexLR scheduler. ( #3012 )
...
* fix: deepcopy lr in RexLR scheduler.
This fixes a problem where when the lr is a scalar tensor, the base_lrs in the get_lr function end up being references to the current learning rate, rather than the correct initial learning rate.
See also related pytorch PR https://github.com/pytorch/pytorch/pull/127190/
* fix: add missing torch.Tensor import
2025-08-04 10:23:49 -04:00
NanoCode012
a54c1be972
Fix: shorten mem logs to 2 decimal places and renamed nd docs ( #3011 ) [skip ci]
...
* fix: shorten memory logs
* fix: title name
2025-08-04 10:23:36 -04:00
github-actions[bot]
5691992d34
chore: update pre-commit hooks ( #3009 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-08-04 10:23:19 -04:00
Dan Saunders
e758343cac
FSDP2 + LoRA kernels ( #2992 )
...
* impl fix
* smoke tests
* patches for fsdp2 + qlora compat
* nit
* working fix
* working fix
* fix merge
* minifying patches; update bnb dep
* renaming; adding tests
* remove duplicate test, add dora guard
* generalize __torch_function__
* revert generalization
* update comments
2025-08-03 20:05:17 -04:00
Wing Lian
deac7b18a1
upgrade peft v0.17.0 and support for lora target_parameters ( #3006 )
2025-08-02 20:24:04 -04:00
Wing Lian
10946afae7
fixes for spinning up vllm service for grpo ( #3001 )
2025-08-02 11:19:24 -04:00
Wing Lian
5639552064
prevent usage of low bit ao optimizers with configurations that use parameter groups ( #3003 )
...
* prevent usage of low bit ao optimizers with configurations that use parameter groups
* use optimizer enum value
* fix validation
2025-08-01 17:54:04 -04:00
Wing Lian
cda3c82351
move ib/rdma libs into base image ( #3002 )
...
* move ib/rdma libs into base image
* use --no-install-recommends
2025-08-01 16:10:37 -04:00
Wing Lian
7c3b428f23
Add validation for TP with models with tied embeddings ( #2999 )
...
* add validation for tp + tied embeddings models
* fix logic and messaging
* add additional guard for null tp size
2025-08-01 13:58:16 -04:00
Wing Lian
01a6bd1a0e
use CCE fix for TP using vocab parallel for CEL ( #3000 )
2025-08-01 13:21:58 -04:00
NanoCode012
41709822a7
fix: move memory usage log to trainer.log ( #2996 ) [skip ci]
2025-08-01 13:21:43 -04:00
Wing Lian
02a37199ee
prevent empty value for vllm_mode ( #2998 )
2025-08-01 09:59:45 -04:00
NanoCode012
7026cd5e9e
Feat: Add N-D parallelism docs ( #2989 )
...
* fix: remove non-existent file
* feat: add n-d parallel docs
* fix: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-08-01 13:18:31 +07:00
NanoCode012
eb0a8a7775
feat: upgrade cce commit to include smollm3, granite, granitemoe ( #2993 )
2025-07-31 18:18:44 -04:00
salman
294c7fe7a6
Distributed/ND-Parallel ( #2977 )
2025-07-31 15:25:02 -04:00
Wing Lian
7b68dfafd7
jagged lr restart scheudler ( #1680 ) [skip ci]
...
* jagged lr restart scheudler
var name fix
make sure to create scheduler first
* wire things together
* more fixes
* fix for nesting scheduler and first anneal phase
* no need for relora trainer anymore since we've generalized the relora scheduler
* remove redundant relora scheduler and lint
* update relora e2e test for updated params
* need restart steps for relora test
* update quarto docs for dropped relora trainer
* update example yaml
* drop verbose arg
* min lr scale support for jagged lr
* don't let min_lr be nonetype
* cleanup args
2025-07-31 13:50:03 -04:00
salman
32a7890231
Revert test update to index.qmd ( #2995 ) [skip ci]
2025-07-31 11:46:31 -04:00
Wing Lian
563f5eed7a
update dependencies - liger + trl ( #2987 )
...
* update dependencies
* set dataset processes for tests
* add support for GSPO
2025-07-31 11:17:17 -04:00
Wing Lian
6ec282094d
actually call the register method on plugins ( #2991 ) [skip ci]
2025-07-31 11:13:15 -04:00
salman
09dda462ab
Fix don't preview docs for contributors ( #2994 ) [skip ci]
...
* checking against fork vs. main repo
* force doc preview
2025-07-31 11:12:41 -04:00
Dan Saunders
bb1cae1a20
CLI: add --launcher option, support launcher args, cleanup, refactor ( #2924 )
...
* add --launcher option; explicit True/False bool args; small cleanup
* refactor
* add torchrun, accelerate cli args
* add rdzv arg default + tests
* update _quarto
* coderabbit
* fix
* we can't set rdvz_id independently across nodes
* coderabbit
* fix tests
2025-07-30 15:46:56 -04:00
Wing Lian
22810c97b7
use warmup_ratio as a better default than warmup steps since it's data dependent ( #2897 ) [skip ci]
...
* use warmup_ratio as a better default than warmup steps since it's data dependent
* replace remainder of warmup_steps
2025-07-30 06:44:06 -04:00
Vincenzo di Cicco
2eb7ff95af
Use '<|finetune_right_pad|>' as padding token for LLama4 ( #2988 ) [skip ci]
2025-07-30 06:38:13 -04:00
NanoCode012
90e5598930
Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes ( #2979 )
...
* fix: lock version in gemma3n docs
* feat: add sample configs and docs
* chore: move mistraltokenizer into mistral folder
* feat: update instructions
* feat: add dynamic load voxtral
* fix: remove incorrect vision config, add audio
* fix: support voxtral processing strategy and address none in data
* feat: patch mistraltokenizer subclass upstream and add missing
* feat: update cce commit to include voxtral
* fix: remove old comment
* fix: gemma3 patch not needed anymore
* fix: voxtral modeling code
* fix: remove incorrect ds path
* fix: adjust apply chat template parsing
* feat: enable voxtral patch
* fix: patch
* feat: update example datasets
* fix: target layer
* feat: update gemma3n docs
* feat: update voxtral docs
* feat: revert assistant parsing to rely on new upstream changes
* chore: skip test till next PR fix
* fix: override upstream decode due to missing handling
* feat: update readme
* fix: update
* feat: add magistral small think support
* feat: update mistral-common dep
* fix: lint
* fix: remove optional dep
* chore: typing
* chore: simply import
* feat(doc): update differences for 2507
* fix: coderrabbit comments
* feat: update clarify docs on new transformers
2025-07-30 15:57:05 +07:00
Wing Lian
1d2aa1e467
upgrade to support latest transformers release ( #2984 )
...
* upgrade to support latest transformers release
* bump mistral common too
* Fix dependencies
2025-07-27 17:05:12 -04:00
NICOLAS BZRD
430be216d8
add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging ( #2981 ) [skip ci]
2025-07-27 17:04:56 -04:00
Wing Lian
28804b82e4
don't create a reference model if grpo beta is 0.0 ( #2983 ) [skip ci]
2025-07-27 17:04:42 -04:00
Wing Lian
add3e5076b
don't publish to netlify on contributor submissions since it requires auth tokens ( #2985 ) [skip ci]
...
* don't publish to netlify on contributor submissions since it requires auth tokens
* fix no-tmux build and add contact to motd
2025-07-27 17:04:27 -04:00
NanoCode012
41434f0c28
feat(doc): add all providers to readme ( #2972 ) [skip ci]
...
* feat(doc): add vastai link
* feat: add cloud providers to readme for more visibility
* add prime intellect, remove Modal as sponsor
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-27 17:03:50 -04:00
Wing Lian
f7ea140838
TiledMLP support for FSDP2 ( #2950 )
...
* make TiledMLP work with FSDP
* cleanup/gc at start of train to prevent large VRAM spike
* chore: lint
* generic function for non-deepspeed training
* unify patch to fix imports
* update readme for ALST and add examples
* make deepspeed attribute on params check more robust
* update with new info from PR review
2025-07-25 07:15:03 -04:00
Wing Lian
460e0f9ed9
improve handling of file lock when content is empty ( #2959 )
2025-07-24 16:10:38 -04:00
Wing Lian
e80faea0db
garbage collect on the end of the step if we're going to save a checkpoint ( #2971 ) [skip ci]
2025-07-24 16:10:23 -04:00
Wing Lian
0ff2f172ef
Act offload lora fix ( #2928 ) [skip ci]
...
* fix activation offloading with lora
* update w e2e test
* add docs for error
2025-07-24 16:10:04 -04:00
salman
1407aac779
Skip CI for draft PRs ( #2970 )
2025-07-24 09:11:46 +01:00
Dan Saunders
b34c3371ed
upgrade torchao ( #2968 )
2025-07-23 10:27:28 -04:00
Wing Lian
5f1a4306b0
don't check dataset labels during preprocess for GRPO ( #2952 ) [skip ci]
...
* don't check dataset labels during preprocess for GRPO
* use enum check per PR feedback
2025-07-22 20:40:44 -04:00
Wing Lian
93709eb5ce
handle refactor upstream for flash attention ( #2966 )
2025-07-22 20:40:04 -04:00