Rahul Tuli
4371f3459e
Use: absolute import
2025-04-28 13:16:29 -04:00
Rahul Tuli
7e1e153831
Address review comments from @markurtz
2025-04-28 13:16:29 -04:00
Rahul Tuli
42de3096cf
Apply suggestions from @markurtz
...
Co-authored-by: Mark Kurtz <mark.j.kurtz@gmail.com >
2025-04-28 13:16:29 -04:00
Rahul Tuli
6411ca3fe1
Use: warning over warn
2025-04-28 13:16:29 -04:00
Rahul Tuli
813809c54d
pre commit hooks
2025-04-28 13:16:29 -04:00
Rahul Tuli
b76d2d1130
Update: review comments!
2025-04-28 13:16:29 -04:00
Rahul Tuli
7946f89df4
Add: SFTPlugin with llmcompressor
2025-04-28 13:16:29 -04:00
Dhruv Mullick
8b33ae1c4f
Fix bug in grpo reward module import ( #2571 )
2025-04-28 00:31:56 -04:00
Wing Lian
f9c7c3bb72
don't use is_main_process during config validation ( #2569 )
2025-04-26 14:14:52 -04:00
Wing Lian
5dba5c82a8
fix support for wandb run_name for rl trainers ( #2566 ) [skip ci]
...
* fix support for wandb run_name for rl trainers
* prefer to use wandb random names for run_name
2025-04-25 21:10:54 -04:00
Chiwan Park
e3c9d541a7
fix: crash when pretraining_dataset with dispatch_batches is false ( #2558 )
2025-04-25 17:15:03 -04:00
Wing Lian
53dbf97d85
make cce default to true when using the plugin ( #2562 ) [skip ci]
2025-04-25 17:14:26 -04:00
Eko Julianto Salim
2c2563bc34
fix: gradient checkpointing functools.partial object has no attribute __self__ ( #2563 ) [skip ci]
...
* fix: gradient checkpointing causing functools.partial error
* lint
* chore: lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-04-25 17:02:37 -04:00
Dan Saunders
ae1c7ace63
Sequence parallel training context manager ( #2553 )
...
* ctx manager for SP
* updates
* update
* further simplifying
* accommodate both training context managers
* simplifying
* simplifying
* nit
* reorg
* tweak codecov yaml
* add gather post hook, simplify, fixes
* pytest
* pytest fix
2025-04-25 10:33:54 -04:00
Wing Lian
a4d5112ae1
builds for torch 2.7.0 ( #2552 )
...
* builds for torch==2.7.0
* use xformers==0.0.29.post3
* no vllm support with torch 2.7
* update default, fix conditional
* no xformers for 270
* no vllm on 2.7.0 for multigpu test too
* remove deprecated verbose arg from scheduler
* 2.7.0 tests on cpu
2025-04-24 00:39:31 -04:00
NanoCode012
a6d28d19b1
feat: add glm and glm4 multipack and cce ( #2546 )
...
* feat: add glm and glm4 multipack
* feat: add glm4 example
* feat: add cce for glm
2025-04-23 10:27:51 -04:00
Wing Lian
32e335dd51
fix missing host/port for vllm ( #2543 )
...
* fix missing host/port for vllm
* set tensor parallel size so it doesn't always default to cli override
2025-04-22 10:16:48 -04:00
Wing Lian
341e95aac9
prevent rate limiting to hf when using dispatch batches ( #2536 ) [skip ci]
2025-04-21 10:31:35 -04:00
Catgat
b882dfb63f
Fixed Rex Scheduler Warm Up ( #2535 ) [skip ci]
...
* Fixed Rex Scheduler Warm Up
* chore: lint
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-04-21 10:30:55 -04:00
Chiwan Park
4ce469d32e
fix: upgrade liger to 0.5.8 and use native Gemma3 patches ( #2527 )
...
* fix: upgrade liger to 0.5.8 and use native Gemma3 patches
* fix: make lint happy
* doc: update Liger Kernel FLCE support for Gemma 3
2025-04-18 09:57:40 -07:00
Wing Lian
60a8f0958d
zero val fix for beta ( #2538 )
2025-04-17 17:27:19 -07:00
NanoCode012
9da730d6a4
fix(doc): cut cross entropy installation instructions broken in qmd ( #2532 )
2025-04-16 15:02:51 -07:00
NanoCode012
32637fad00
fix: preprocess yielding whole dataset to each worker ( #2503 ) [skip ci]
2025-04-16 15:02:35 -07:00
Dan Saunders
b8c633aa97
batch api HF adapter for ring-flash-attn; cleanup and improvements ( #2520 )
...
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* removing pad_to_sequence_len=False for now
* fix
* updating docs to include batch SP
* review comments
* fixes for batch API funcs, simplify
* fixes
* fix
* updates
* add batch_zigzag smoke test
2025-04-16 13:50:48 -04:00
NanoCode012
682a9cf79b
Fix: add delinearization and make qlora work with fsdp2 ( #2515 )
...
* fixes for delinearization, and make qlora work with fsdp2
* Add back mistakenly removed lm_eval
* typo [skip ci]
* patch evals for torch.compile + fsdp2
* also check torch_compile w fsdp2
* lots of fixes for flex attn with llama4
* fix patch check and patch llama4 too
* attempt to make the patches stick
* use transformers 4.51.2
* update configs and README for llama4
* remove torch.compile for CI test
* cleanup any existing singletons
* set singleton cache to None instead of deleting
* use importlib reload with monkeypatch
* don't worry about transformers version, mark inputs with grads, fix regex
* make sure embeds aren't on cpu
* logging and mem improvements
* vllm version and add to docker, make sure to save processor on conversion
* fix ambiguous tensor bool check
* fix vllm to not use v1, upgrade hf transformers
* fix tests
* make flex_attn_compile_kwargs configurable, since this depends on model params
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com >
2025-04-15 23:31:39 -07:00
NanoCode012
271b24cccc
feat: update cce to latest ( #2521 )
2025-04-15 22:17:10 -07:00
NanoCode012
e0420b3528
fix: allow merge lora on pre-quantized model ( #2511 )
...
* fix: allow merge lora on pre-quantized model
* fix: remove unused sections per comment
2025-04-09 14:01:42 -04:00
NanoCode012
f85861a0b2
fix: liger swiglu for llama4 ( #2504 )
...
* fix: liger swiglu for llama4
* feat: add liger to deepseek v3
* fix: unpack not found
* fix: spelling
* fix: comment out deepseek v3
* fix: retest deepseek
* fix: map glu
* fix: patch model forward
* chore: add temp code to save
* fix: remove deepseek to move into separate PR
2025-04-09 02:53:17 -04:00
Wing Lian
0dac2ddeac
Llama4 linearized ( #2502 )
...
* llama4 support for linearized experts
* clean up fsdp2 sharding to prevent hang
* add yaml config
* cleanup example [skip ci]
2025-04-07 20:47:00 -04:00
NanoCode012
a6c03217f5
feat: add llama4 CCE ( #2498 )
...
* feat: add llama4 CCE
* fix: update model support list doc
* feat: include llama4_text
2025-04-07 17:12:28 -04:00
Dan Saunders
59cd472504
SP cu_seqlens fix, refactor ( #2495 )
...
* working on masking fix
* refactor and fix multipack seqlens
* pre-commit fix
* adding smoke test
* using existing packed seqlens util
* log warning re: logged losses / gradient scaling per rank
2025-04-07 14:47:57 -04:00
NanoCode012
9b89591ead
Feat: Add doc on loading datasets and support for Azure/OCI ( #2482 )
...
* fix: remove unused config
* feat: add doc on dataset loading
* feat: enable azure and oci remote file system
* feat: add adlfs and ocifs to requirements
* fix: add links between dataset formats and dataset loading
* fix: remove unused condition
* Revert "fix: remove unused condition"
This reverts commit 5fe13be73e .
2025-04-07 12:41:13 -04:00
NanoCode012
d25daebea9
fix: duplicate llama4 chattemplate enum ( #2500 )
...
* fix: duplicate llama4 chattemplate enum
* fix: duplicate chat_template string
2025-04-07 12:39:19 -04:00
NanoCode012
e0e5d9b1d6
feat: add llama4 multimodal ( #2499 )
...
* feat: add llama4 multimodal
* feat: add torchvision to base docker
* just use latest torchvision
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-04-07 10:49:29 -04:00
Wing Lian
8bbad21bfd
llama4 support ( #2493 )
...
* llama4 support
* add xet support [skip ci]
* be flexible on transformers version and skip test on version
* don't use deepspeed for the fix_untrained_tokens test
* reordering to trigger torch 2.6.0 tests first
* slightly smaller train set
* use 4.51.0 for now
* remove stray print, add llama4 chat template to schema, bump peft to 0.15.1
* patches to make llama4 performant
* add preliminary fp8 support
2025-04-07 10:49:15 -04:00
Wing Lian
5f4af3665d
FSDP2 support ( #2469 )
...
* fsdp2 support
* use accelerate release 1.6.0
* allow 8bit optims with fsdp2
* liger + torch compile fix
* add fsdp2 e2e tests
* use transformers commit with fsdp2 support
* skip zero3 tests for this PR for now
* fix fsdp2 config for ci
* make sure both flex and flash attn work with fsdp2, skip fix untrained tokens
* okay, actually use fdsp2...
* more fixes to flex for fsdp2
* make sure to patch all the loaded models
* additional validation for fsdp2, bump dep versions
2025-04-06 17:08:01 -04:00
Sung Ching Liu
a8f38c367c
Flex Attention + Packing with BlockMask support ( #2363 )
2025-04-05 18:02:57 -04:00
Wing Lian
949471039f
fix tokenizer overrides w gemma3 ( #2488 )
...
* fix tokenizer overrides w gemma3
* fix offline wrapping
2025-04-05 01:25:44 -04:00
NanoCode012
de451f99a5
fix: cohere cce scaling wrong tensor ( #2483 )
2025-04-04 13:47:44 -04:00
Dan Saunders
e0cc4f1a87
removing deepspeed guard for LoRA Triton kernels ( #2480 )
2025-04-03 14:50:56 -04:00
Wing Lian
5249e98058
add additional tf32 opt for cudnn ( #2477 ) [skip ci]
2025-04-03 08:47:52 -04:00
Wing Lian
3877c5c69d
set release version 0.8.0 ( #2476 )
...
ci-cd / build-axolotl (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl (vllm, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
publish pypi / Create Release (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, 3.11, 2.5.1) (push) Has been cancelled
ci-cd / build-axolotl-cloud (<nil>, 124, 12.4.1, true, 3.11, 2.6.0) (push) Has been cancelled
ci-cd / build-axolotl-cloud-no-tmux (<nil>, 124, 12.4.1, 3.11, 2.4.1) (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
* set release version 0.8.0
* make sure to include ring-flash-attn in docker image build
2025-04-02 09:50:56 -04:00
NanoCode012
e6cfb093d2
fix: disable SP during merge ( #2470 ) [skip ci]
2025-04-02 09:35:00 -04:00
NanoCode012
7abc71dc0b
fix: gemma3 loss in forward pass ( #2473 ) [skip ci]
...
* fix: gemma3 loss in forward pass
* fix: lint
* fix: move patch before plugins
* Update src/axolotl/monkeypatch/gemma3.py
Co-authored-by: salman <salman.mohammadi@outlook.com >
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com >
Co-authored-by: salman <salman.mohammadi@outlook.com >
2025-04-02 09:34:41 -04:00
NanoCode012
45bf634d17
feat: add support for multimodal in lora kernels ( #2472 ) [skip ci]
...
* feat: add support for multimodal in lora kernels
* fix: improve multimodal checks
* fix: add fallback for model config
* chor: add gemma3 to docs
2025-04-02 09:33:46 -04:00
NanoCode012
80ba4b69f1
fix: pydantic warning validator not returning self ( #2474 )
2025-04-02 07:40:49 -04:00
NanoCode012
9e22c4ca6a
fix: set rl=None during inference ( #2463 )
2025-04-01 12:25:53 -04:00
Dan Saunders
7d0eb66b54
fixing eval for SP ( #2468 )
2025-04-01 11:59:08 -04:00
Wing Lian
df119e3724
Validation for Muon optimizer with DS/FSDP ( #2464 )
2025-04-01 09:39:12 -04:00
NanoCode012
f4ae8816bb
Fix: remove the numerous sequential log ( #2461 )
...
* fix: remove sequential logs
* feat(doc): add for sample pack sequentially and curriculum sampling
2025-04-01 09:20:00 -04:00