VED
784f8c0e95
fix:kd_distillation key_error logprobs ( #2990 )
...
* fix:kd_distillation key_error logprobs
* style
* fix: leave handling of pop logprobs to parent
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-08-06 08:02:07 -04:00
NanoCode012
e3177c3210
feat: add complete optimizer docs ( #3017 ) [skip ci]
...
* feat: add complete optimizer docs
* fix: deprecate old torchao adamw low bit
2025-08-06 08:01:51 -04:00
Wing Lian
70faea331f
add support for connecting via prime-intellect ( #3021 )
2025-08-06 01:06:52 -04:00
Wing Lian
8021c718ce
use skip_move_to_device for all cases ( #3015 )
...
* use skip_move_to_device for all cases
* use experimental option for skip move
2025-08-06 00:13:12 -04:00
Wing Lian
42f5e6f9e9
upgrade transformers==4.55.0 ( #3018 )
2025-08-05 16:29:12 -04:00
Wing Lian
ab49d16e34
Dion optimizer support ( #3014 )
...
* Add support for Dion optimizer
* dion training kwargs
* fix var names
* no dion 8bit for now
* use updated axolotl-contribs-mit for dion optimizer
* add smoke test for dion optimizer
* add docs
* fix typo during edits
* fix test to not remove load in 8bit
2025-08-04 16:33:30 -04:00
Carsten Kragelund Jørgensen
33d094721c
fix: deepcopy lr in RexLR scheduler. ( #3012 )
...
* fix: deepcopy lr in RexLR scheduler.
This fixes a problem where when the lr is a scalar tensor, the base_lrs in the get_lr function end up being references to the current learning rate, rather than the correct initial learning rate.
See also related pytorch PR https://github.com/pytorch/pytorch/pull/127190/
* fix: add missing torch.Tensor import
2025-08-04 10:23:49 -04:00
NanoCode012
a54c1be972
Fix: shorten mem logs to 2 decimal places and renamed nd docs ( #3011 ) [skip ci]
...
* fix: shorten memory logs
* fix: title name
2025-08-04 10:23:36 -04:00
github-actions[bot]
5691992d34
chore: update pre-commit hooks ( #3009 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-08-04 10:23:19 -04:00
Dan Saunders
e758343cac
FSDP2 + LoRA kernels ( #2992 )
...
* impl fix
* smoke tests
* patches for fsdp2 + qlora compat
* nit
* working fix
* working fix
* fix merge
* minifying patches; update bnb dep
* renaming; adding tests
* remove duplicate test, add dora guard
* generalize __torch_function__
* revert generalization
* update comments
2025-08-03 20:05:17 -04:00
Wing Lian
deac7b18a1
upgrade peft v0.17.0 and support for lora target_parameters ( #3006 )
2025-08-02 20:24:04 -04:00
Wing Lian
10946afae7
fixes for spinning up vllm service for grpo ( #3001 )
2025-08-02 11:19:24 -04:00
Wing Lian
5639552064
prevent usage of low bit ao optimizers with configurations that use parameter groups ( #3003 )
...
* prevent usage of low bit ao optimizers with configurations that use parameter groups
* use optimizer enum value
* fix validation
2025-08-01 17:54:04 -04:00
Wing Lian
cda3c82351
move ib/rdma libs into base image ( #3002 )
...
* move ib/rdma libs into base image
* use --no-install-recommends
2025-08-01 16:10:37 -04:00
Wing Lian
7c3b428f23
Add validation for TP with models with tied embeddings ( #2999 )
...
* add validation for tp + tied embeddings models
* fix logic and messaging
* add additional guard for null tp size
2025-08-01 13:58:16 -04:00
Wing Lian
01a6bd1a0e
use CCE fix for TP using vocab parallel for CEL ( #3000 )
2025-08-01 13:21:58 -04:00
NanoCode012
41709822a7
fix: move memory usage log to trainer.log ( #2996 ) [skip ci]
2025-08-01 13:21:43 -04:00
Wing Lian
02a37199ee
prevent empty value for vllm_mode ( #2998 )
2025-08-01 09:59:45 -04:00
NanoCode012
7026cd5e9e
Feat: Add N-D parallelism docs ( #2989 )
...
* fix: remove non-existent file
* feat: add n-d parallel docs
* fix: comments
---------
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-08-01 13:18:31 +07:00
NanoCode012
eb0a8a7775
feat: upgrade cce commit to include smollm3, granite, granitemoe ( #2993 )
2025-07-31 18:18:44 -04:00
salman
294c7fe7a6
Distributed/ND-Parallel ( #2977 )
2025-07-31 15:25:02 -04:00
Wing Lian
7b68dfafd7
jagged lr restart scheudler ( #1680 ) [skip ci]
...
* jagged lr restart scheudler
var name fix
make sure to create scheduler first
* wire things together
* more fixes
* fix for nesting scheduler and first anneal phase
* no need for relora trainer anymore since we've generalized the relora scheduler
* remove redundant relora scheduler and lint
* update relora e2e test for updated params
* need restart steps for relora test
* update quarto docs for dropped relora trainer
* update example yaml
* drop verbose arg
* min lr scale support for jagged lr
* don't let min_lr be nonetype
* cleanup args
2025-07-31 13:50:03 -04:00
salman
32a7890231
Revert test update to index.qmd ( #2995 ) [skip ci]
2025-07-31 11:46:31 -04:00
Wing Lian
563f5eed7a
update dependencies - liger + trl ( #2987 )
...
* update dependencies
* set dataset processes for tests
* add support for GSPO
2025-07-31 11:17:17 -04:00
Wing Lian
6ec282094d
actually call the register method on plugins ( #2991 ) [skip ci]
2025-07-31 11:13:15 -04:00
salman
09dda462ab
Fix don't preview docs for contributors ( #2994 ) [skip ci]
...
* checking against fork vs. main repo
* force doc preview
2025-07-31 11:12:41 -04:00
Dan Saunders
bb1cae1a20
CLI: add --launcher option, support launcher args, cleanup, refactor ( #2924 )
...
* add --launcher option; explicit True/False bool args; small cleanup
* refactor
* add torchrun, accelerate cli args
* add rdzv arg default + tests
* update _quarto
* coderabbit
* fix
* we can't set rdvz_id independently across nodes
* coderabbit
* fix tests
2025-07-30 15:46:56 -04:00
Wing Lian
22810c97b7
use warmup_ratio as a better default than warmup steps since it's data dependent ( #2897 ) [skip ci]
...
* use warmup_ratio as a better default than warmup steps since it's data dependent
* replace remainder of warmup_steps
2025-07-30 06:44:06 -04:00
Vincenzo di Cicco
2eb7ff95af
Use '<|finetune_right_pad|>' as padding token for LLama4 ( #2988 ) [skip ci]
2025-07-30 06:38:13 -04:00
NanoCode012
90e5598930
Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes ( #2979 )
...
* fix: lock version in gemma3n docs
* feat: add sample configs and docs
* chore: move mistraltokenizer into mistral folder
* feat: update instructions
* feat: add dynamic load voxtral
* fix: remove incorrect vision config, add audio
* fix: support voxtral processing strategy and address none in data
* feat: patch mistraltokenizer subclass upstream and add missing
* feat: update cce commit to include voxtral
* fix: remove old comment
* fix: gemma3 patch not needed anymore
* fix: voxtral modeling code
* fix: remove incorrect ds path
* fix: adjust apply chat template parsing
* feat: enable voxtral patch
* fix: patch
* feat: update example datasets
* fix: target layer
* feat: update gemma3n docs
* feat: update voxtral docs
* feat: revert assistant parsing to rely on new upstream changes
* chore: skip test till next PR fix
* fix: override upstream decode due to missing handling
* feat: update readme
* fix: update
* feat: add magistral small think support
* feat: update mistral-common dep
* fix: lint
* fix: remove optional dep
* chore: typing
* chore: simply import
* feat(doc): update differences for 2507
* fix: coderrabbit comments
* feat: update clarify docs on new transformers
2025-07-30 15:57:05 +07:00
Wing Lian
1d2aa1e467
upgrade to support latest transformers release ( #2984 )
...
* upgrade to support latest transformers release
* bump mistral common too
* Fix dependencies
2025-07-27 17:05:12 -04:00
NICOLAS BZRD
430be216d8
add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging ( #2981 ) [skip ci]
2025-07-27 17:04:56 -04:00
Wing Lian
28804b82e4
don't create a reference model if grpo beta is 0.0 ( #2983 ) [skip ci]
2025-07-27 17:04:42 -04:00
Wing Lian
add3e5076b
don't publish to netlify on contributor submissions since it requires auth tokens ( #2985 ) [skip ci]
...
* don't publish to netlify on contributor submissions since it requires auth tokens
* fix no-tmux build and add contact to motd
2025-07-27 17:04:27 -04:00
NanoCode012
41434f0c28
feat(doc): add all providers to readme ( #2972 ) [skip ci]
...
* feat(doc): add vastai link
* feat: add cloud providers to readme for more visibility
* add prime intellect, remove Modal as sponsor
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-27 17:03:50 -04:00
Wing Lian
f7ea140838
TiledMLP support for FSDP2 ( #2950 )
...
* make TiledMLP work with FSDP
* cleanup/gc at start of train to prevent large VRAM spike
* chore: lint
* generic function for non-deepspeed training
* unify patch to fix imports
* update readme for ALST and add examples
* make deepspeed attribute on params check more robust
* update with new info from PR review
2025-07-25 07:15:03 -04:00
Wing Lian
460e0f9ed9
improve handling of file lock when content is empty ( #2959 )
2025-07-24 16:10:38 -04:00
Wing Lian
e80faea0db
garbage collect on the end of the step if we're going to save a checkpoint ( #2971 ) [skip ci]
2025-07-24 16:10:23 -04:00
Wing Lian
0ff2f172ef
Act offload lora fix ( #2928 ) [skip ci]
...
* fix activation offloading with lora
* update w e2e test
* add docs for error
2025-07-24 16:10:04 -04:00
salman
1407aac779
Skip CI for draft PRs ( #2970 )
2025-07-24 09:11:46 +01:00
Dan Saunders
b34c3371ed
upgrade torchao ( #2968 )
2025-07-23 10:27:28 -04:00
Wing Lian
5f1a4306b0
don't check dataset labels during preprocess for GRPO ( #2952 ) [skip ci]
...
* don't check dataset labels during preprocess for GRPO
* use enum check per PR feedback
2025-07-22 20:40:44 -04:00
Wing Lian
93709eb5ce
handle refactor upstream for flash attention ( #2966 )
2025-07-22 20:40:04 -04:00
Dan Saunders
208fb7b8e7
basic torchao fp8 mixed precision training ( #2926 )
...
* debug
* debug
* debug
* revert unneeded change
* add accelerator config to base trainer builder
* add back accumulated_cache_size_limit setting
* lint
* accelerator constructor patch for single-GPU torch fp8
* lint
* re-using existing fp8 code
* lint
* remove accelerate patch now fix in latest release
* fix
* docs
* add fp8 + fsdp2 example
* remove unused config
* update config
* smoke tests
* add validator
* add 2.7.0 guard for fsdp2
* fix
* add config descriptions
* add FSDP doc link
* nit
* set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather
* better cfg for smoke tests
* add test for accelerate patching
* update fp8 validator
2025-07-22 16:27:47 -04:00
Wing Lian
b86a1d47b0
we don't need to call check_dataset_labels when skip_prepare_dataset is set ( #2962 )
...
* we don't need to call check_dataset_labels when skip_prepare_dataset is set
* Fix actual bug and revert prior fix
* warn and early return instead of raising an error
* use error
2025-07-22 10:00:53 -04:00
NanoCode012
01d8175d48
fix: revert changing default optimizer to muon ( #2965 ) [skip ci]
2025-07-22 10:00:30 -04:00
NanoCode012
631268a0ca
revert renaming of deepspeed stage3 args that use auto ( #2964 ) [skip ci]
...
* Revert "fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg…"
This reverts commit e207762928 .
* don't revert the values that don't use 'auto'
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-22 09:59:47 -04:00
Wing Lian
3a208cfd84
Autocomplete axolotl CLI ( #2955 )
...
* static autocomplete script for axolotl cli
* use list of commands that should autocomplete yaml files
* make sure to chmod the autocomplete script as executable
* shellcheck and fix autocompletion of directory/sub-dirs
* more shellcheck fixes
2025-07-22 08:30:31 -04:00
github-actions[bot]
7267edc168
chore: update pre-commit hooks ( #2954 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-07-22 08:30:00 -04:00
NanoCode012
dfba881e99
Feat: add gemma3n support ( #2852 )
...
* feat: add gemma3n cce
* feat: add sample config
* feat: add gemma3n multimodal mode
* feat: add audio example
* feat: support audio and return pixel values in collator
* feat: support unmask only assistant region (gemma3n for now)
* feat(doc): add notes for audio loading
* feat: add audio support for gemma3n
* feat: update examples
* feat: add gemma3n to the docs
* fix: add link at top
* feat(doc): clarify additional requirements
* fix: mllama missing aspect ratio
* fix: mllama need attention fixes for fa2
* Partially Revert "fix: mllama need attention fixes for fa2"
This reverts commit a0bfdd1777 .
* fix: disable FA2 for mllama in vision mode
* feat: update configs to use proper attention
* fix: support other vision features
* feat(doc): clarify requirements for gemma3n
2025-07-22 16:52:15 +07:00