Wing Lian
08aa74e418
fix llama modeling
2025-07-30 11:37:58 -04:00
Wing Lian
dfa14f87ab
fix residuals and add llama support
2025-07-30 10:22:38 -04:00
Wing Lian
fbe1b504da
add custom modeling for gemma3 using liger fused add rms
2025-07-30 08:21:03 -04:00
Wing Lian
5b8370969c
actually call the register method on plugins
2025-07-30 08:05:25 -04:00
Wing Lian
22810c97b7
use warmup_ratio as a better default than warmup steps since it's data dependent ( #2897 ) [skip ci]
...
* use warmup_ratio as a better default than warmup steps since it's data dependent
* replace remainder of warmup_steps
2025-07-30 06:44:06 -04:00
Vincenzo di Cicco
2eb7ff95af
Use '<|finetune_right_pad|>' as padding token for LLama4 ( #2988 ) [skip ci]
2025-07-30 06:38:13 -04:00
NanoCode012
90e5598930
Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes ( #2979 )
...
* fix: lock version in gemma3n docs
* feat: add sample configs and docs
* chore: move mistraltokenizer into mistral folder
* feat: update instructions
* feat: add dynamic load voxtral
* fix: remove incorrect vision config, add audio
* fix: support voxtral processing strategy and address none in data
* feat: patch mistraltokenizer subclass upstream and add missing
* feat: update cce commit to include voxtral
* fix: remove old comment
* fix: gemma3 patch not needed anymore
* fix: voxtral modeling code
* fix: remove incorrect ds path
* fix: adjust apply chat template parsing
* feat: enable voxtral patch
* fix: patch
* feat: update example datasets
* fix: target layer
* feat: update gemma3n docs
* feat: update voxtral docs
* feat: revert assistant parsing to rely on new upstream changes
* chore: skip test till next PR fix
* fix: override upstream decode due to missing handling
* feat: update readme
* fix: update
* feat: add magistral small think support
* feat: update mistral-common dep
* fix: lint
* fix: remove optional dep
* chore: typing
* chore: simply import
* feat(doc): update differences for 2507
* fix: coderrabbit comments
* feat: update clarify docs on new transformers
2025-07-30 15:57:05 +07:00
Wing Lian
1d2aa1e467
upgrade to support latest transformers release ( #2984 )
...
* upgrade to support latest transformers release
* bump mistral common too
* Fix dependencies
2025-07-27 17:05:12 -04:00
NICOLAS BZRD
430be216d8
add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging ( #2981 ) [skip ci]
2025-07-27 17:04:56 -04:00
Wing Lian
28804b82e4
don't create a reference model if grpo beta is 0.0 ( #2983 ) [skip ci]
2025-07-27 17:04:42 -04:00
Wing Lian
add3e5076b
don't publish to netlify on contributor submissions since it requires auth tokens ( #2985 ) [skip ci]
...
* don't publish to netlify on contributor submissions since it requires auth tokens
* fix no-tmux build and add contact to motd
2025-07-27 17:04:27 -04:00
NanoCode012
41434f0c28
feat(doc): add all providers to readme ( #2972 ) [skip ci]
...
* feat(doc): add vastai link
* feat: add cloud providers to readme for more visibility
* add prime intellect, remove Modal as sponsor
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-27 17:03:50 -04:00
Wing Lian
f7ea140838
TiledMLP support for FSDP2 ( #2950 )
...
* make TiledMLP work with FSDP
* cleanup/gc at start of train to prevent large VRAM spike
* chore: lint
* generic function for non-deepspeed training
* unify patch to fix imports
* update readme for ALST and add examples
* make deepspeed attribute on params check more robust
* update with new info from PR review
2025-07-25 07:15:03 -04:00
Wing Lian
460e0f9ed9
improve handling of file lock when content is empty ( #2959 )
2025-07-24 16:10:38 -04:00
Wing Lian
e80faea0db
garbage collect on the end of the step if we're going to save a checkpoint ( #2971 ) [skip ci]
2025-07-24 16:10:23 -04:00
Wing Lian
0ff2f172ef
Act offload lora fix ( #2928 ) [skip ci]
...
* fix activation offloading with lora
* update w e2e test
* add docs for error
2025-07-24 16:10:04 -04:00
salman
1407aac779
Skip CI for draft PRs ( #2970 )
2025-07-24 09:11:46 +01:00
Dan Saunders
b34c3371ed
upgrade torchao ( #2968 )
2025-07-23 10:27:28 -04:00
Wing Lian
5f1a4306b0
don't check dataset labels during preprocess for GRPO ( #2952 ) [skip ci]
...
* don't check dataset labels during preprocess for GRPO
* use enum check per PR feedback
2025-07-22 20:40:44 -04:00
Wing Lian
93709eb5ce
handle refactor upstream for flash attention ( #2966 )
2025-07-22 20:40:04 -04:00
Dan Saunders
208fb7b8e7
basic torchao fp8 mixed precision training ( #2926 )
...
* debug
* debug
* debug
* revert unneeded change
* add accelerator config to base trainer builder
* add back accumulated_cache_size_limit setting
* lint
* accelerator constructor patch for single-GPU torch fp8
* lint
* re-using existing fp8 code
* lint
* remove accelerate patch now fix in latest release
* fix
* docs
* add fp8 + fsdp2 example
* remove unused config
* update config
* smoke tests
* add validator
* add 2.7.0 guard for fsdp2
* fix
* add config descriptions
* add FSDP doc link
* nit
* set force_recompute_fp8_weight_in_bwd with enable_fsdp_float8_all_gather
* better cfg for smoke tests
* add test for accelerate patching
* update fp8 validator
2025-07-22 16:27:47 -04:00
Wing Lian
b86a1d47b0
we don't need to call check_dataset_labels when skip_prepare_dataset is set ( #2962 )
...
* we don't need to call check_dataset_labels when skip_prepare_dataset is set
* Fix actual bug and revert prior fix
* warn and early return instead of raising an error
* use error
2025-07-22 10:00:53 -04:00
NanoCode012
01d8175d48
fix: revert changing default optimizer to muon ( #2965 ) [skip ci]
2025-07-22 10:00:30 -04:00
NanoCode012
631268a0ca
revert renaming of deepspeed stage3 args that use auto ( #2964 ) [skip ci]
...
* Revert "fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg…"
This reverts commit e207762928 .
* don't revert the values that don't use 'auto'
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-22 09:59:47 -04:00
Wing Lian
3a208cfd84
Autocomplete axolotl CLI ( #2955 )
...
* static autocomplete script for axolotl cli
* use list of commands that should autocomplete yaml files
* make sure to chmod the autocomplete script as executable
* shellcheck and fix autocompletion of directory/sub-dirs
* more shellcheck fixes
2025-07-22 08:30:31 -04:00
github-actions[bot]
7267edc168
chore: update pre-commit hooks ( #2954 ) [skip ci]
...
Co-authored-by: djsaunde <1245942+djsaunde@users.noreply.github.com >
2025-07-22 08:30:00 -04:00
NanoCode012
dfba881e99
Feat: add gemma3n support ( #2852 )
...
* feat: add gemma3n cce
* feat: add sample config
* feat: add gemma3n multimodal mode
* feat: add audio example
* feat: support audio and return pixel values in collator
* feat: support unmask only assistant region (gemma3n for now)
* feat(doc): add notes for audio loading
* feat: add audio support for gemma3n
* feat: update examples
* feat: add gemma3n to the docs
* fix: add link at top
* feat(doc): clarify additional requirements
* fix: mllama missing aspect ratio
* fix: mllama need attention fixes for fa2
* Partially Revert "fix: mllama need attention fixes for fa2"
This reverts commit a0bfdd1777 .
* fix: disable FA2 for mllama in vision mode
* feat: update configs to use proper attention
* fix: support other vision features
* feat(doc): clarify requirements for gemma3n
2025-07-22 16:52:15 +07:00
Wing Lian
d32058e149
include torchvision in build for upstream changes requiring it now ( #2953 ) [skip ci]
2025-07-22 04:19:16 -04:00
NanoCode012
bc1076d8a2
fix: suppress warning if we enabled skip prepare ( #2958 )
2025-07-21 11:42:04 -04:00
Wing Lian
b7e8f66e5a
upstream fixes in cce for dora and tensor paralel support ( #2960 ) [skip ci]
2025-07-21 11:41:53 -04:00
Wing Lian
e207762928
fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg ( #2956 ) [skip ci]
...
* fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg
* replace the rest of the migrated deepspeed params
2025-07-21 11:41:31 -04:00
Wing Lian
fefb0797ee
better handling for reward function checks for GRPO ( #2933 ) [skip ci]
...
* better handling for reward function checks for GRPO
* consolidate msg copy
2025-07-21 11:41:15 -04:00
Wing Lian
af8d257aa2
make pad_to_sequence_len default to the same value as sample_packing ( #2941 ) [skip ci]
...
* make pad_to_sequence_len default to the same value as sample_packing
* remove duplicate validation
* fix test
* update description meta
Co-authored-by: NanoCode012 <nano@axolotl.ai >
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-07-21 11:40:56 -04:00
Wing Lian
db5f6f4693
limit num_proc when saving datasets to disk ( #2948 ) [skip ci]
...
* limit num_proc when saving datasets to disk
* enforce at least 1 in case it rounds down to 0, and sane divisor is at least 8 rows per worker to save
* update fixtures with dataset processes since that should never be NoneType
* improve reusability for tests
2025-07-21 11:39:38 -04:00
Wing Lian
8e5f146701
Fix cloud docker image build and remove apt files for optim ( #2961 )
...
* make sure to apt update to install sudo and tmux
* remove apt archives too
2025-07-21 11:05:00 -04:00
Wing Lian
31a15a49b6
add additional packages via apt for better multi-node support ( #2949 )
...
* cleanup in Dockerfile and add infiniband packages
* fixes for ci
* fix nightly too
2025-07-20 21:19:23 -04:00
NanoCode012
b986f7c7cb
fix: return proper attention for llama4 lora kernel and fsdp2 llama4 example fix ( #2943 )
...
* fix: return proper attention for llama4 lora optim
* fix: update fsdp2 llama4 config
2025-07-19 13:54:43 -04:00
salman
e5734e5cf0
adding torchtitan link ( #2945 ) [skip ci]
2025-07-19 13:54:14 -04:00
Wing Lian
109d9c7442
make the initial call to tokenizer.pad not spam the console ( #2946 ) [skip ci]
...
* make the initial call to tokenizer.pad not spam the console
* add guard from feedback
* make another common console output less verbose
* more logging fixes
2025-07-19 13:53:35 -04:00
Wing Lian
170322a1f0
make sure log level is upper ( #2934 )
2025-07-17 15:32:55 -04:00
Wing Lian
5f5ae76213
add validation around cce + chunked_ce ( #2932 ) [skip ci]
...
* add validation around cce + chunked_ce
* return on end of validation method
2025-07-17 15:32:38 -04:00
Wing Lian
a798975b7c
coderabbit manual settings ( #2940 ) [skip ci]
2025-07-17 15:32:16 -04:00
Wing Lian
d23f972602
use state for wandb in callbacks ( #2930 ) [skip ci]
2025-07-17 15:31:56 -04:00
Wing Lian
8e41317250
don't use include_tokens_per_second for GRPO ( #2931 ) [skip ci]
...
* don't use include_tokens_per_second for GRPO
* use blocklist instead
2025-07-17 15:31:21 -04:00
Varun Gumma
9f2bb188a4
Improve Dataset Processing Multiprocessing, Sharding, and Qwen Tokenizer Bug Fix. ( #2918 )
...
* Added a feature to save prepared dataset in specified shards, removed limiter on multiprocessing during tokenization, and a bug fix of qwen tokenizer
* removed limiters and fixed config variable name
* black lint
* chore: lint
* feat: update handling of dataset_processes
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2025-07-17 09:47:58 -04:00
Wing Lian
9dde9e1b71
misc fixes 202507 ( #2937 ) [skip ci]
...
* misc fixes 202507
* manually handle attn class for llama4
2025-07-17 09:47:45 -04:00
Wing Lian
f2474ef941
bump accelerate to 1.9.0 ( #2936 ) [skip ci]
2025-07-17 09:46:43 -04:00
Wing Lian
8a4bcacdb2
cu126-torch271 for cloud docker image should be tagged with main-latest ( #2935 )
2025-07-17 00:01:23 -04:00
Wing Lian
d2c3d5a954
run nightly-vs-upstream-main on 2.7.1 and multi-gpu also ( #2929 ) [skip ci]
2025-07-16 21:45:42 -04:00
Wing Lian
36cbe13d18
activation offloading with cuda streams doesn't work with LoRA ( #2927 )
2025-07-16 11:59:20 -04:00