Wing Lian
cada93cee5
upgrade transformers==5.3.0 trl==0.29.0 kernels ( #3459 )
...
* upgrade transformers==5.3.0 trl==0.29.0 kernels
* use latest deepspeed fixes
* use corect image for cleanup
* fix test outputs for tokenizer fixes upstream
* fix import:
* keep trl at 0.28.0
* handle updated API
* use latest trl since 0.28.0 doesn't work with latest transformers
* use trl experimental for pad to length
* monkeypatch trl with ORPOTrainer so liger doesn't croak
* upgrade accelerate
* more fixes
* move patch for orpotrainer
* load the imports later
* remove use_logits_to_keep
* fix loss_type arg as a list
* fetch hf cache from s3
* just manually download the missing model for now
* lint for pre-commit update
* a few more missing models on disk
* fix: loss_type internally now list
* fix: remove deprecated code and raise deprecate
* fix: remove unneeded blocklist
* fix: remove reliance on transformers api to find package available
* chore: refactor shim for less sideeffect
* fix: silent trl experimental warning
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2026-03-06 09:11:20 -05:00
NanoCode012
6a8baf8fa7
feat: add sonicmoe ( #3411 )
...
* feat: add sonicmoe
* feat: add torch compile for routing
* feat: add routing smoke test
* feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe
* fix: disable mlp kernel for sonicmoe too
* feat: update to sonicmoe release
* chore: update import following new sonicmoe changes
* feat: update handling for blackwell
* feat: add sonicmoe e2e test
* fix: installation for updated sonicmoe
* fix: git commit
* fix: ignore py req and fix metadata
* fix: increase min hidden size to match sonicmoe kernel min
* fix: attempt properly interleave and handle unpatch mid-test
* chore: refactor teardown better
* chore: refactor to re-use rearrange
* fix: add idempotency guard
* fix: address comments on CI memory and interleave
* fix: tests grad, param doublewrapped
2026-03-05 13:43:31 -05:00
Wing Lian
68f1b7004c
ScatterMoE LoRA support ( #3410 )
...
* scattermoe lora support
* fsdp, bf16, dim fixes
* expert weights aren't needed in save for bwd since they are frozen
* use sonicmoe optim options
* update save model from upstream
* fixes per code review feedback and add tests
* revert removal of CP fix
* misc fixes
2026-02-24 14:59:55 -05:00
PraMamba
8aab807e67
feat: Add SwanLab integration for experiment tracking ( #3334 )
...
* feat(swanlab): add SwanLab integration for experiment tracking
SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.
Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection
Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit
- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)
Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com >
* only run swanlab integration tests if package is available
---------
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2026-01-06 09:19:18 -05:00
Seung Hyun Cho
3e51a680c2
fix: Fix evaluation loss in KD trainer ( #3271 )
...
* fix: Fix evaluation loss in KD trainer
* Fix v2 strategy super() call
* fix: Add safety check for total_tokens in log method
* fix: simplified num items and outputs return handling
* fix: add missing model forward pass in compute_loss
* refactor: Use Template Method pattern for chat template strategies
* refactor: use pop(None) and remove v2 override
* chore: lint
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-12-17 13:40:36 -05:00
NanoCode012
11eb36585a
feat: add arg to enable dft in liger ( #3125 )
...
* feat: add arg to enable dft in liger
* feat: add tests use_token_scaling
* fix: test
* fix: move check to args
2025-11-10 21:37:47 +07:00
Dan Saunders
1b53c49e1a
text diffusion training plugin ( #3067 )
...
* diffusion training plugin
* cleanup
* nits
* fixes + improvements
* add back in reinit_weights (clobbered?); masking / pretrain fixes
* nits
* cleanup; tests draft
* sample generation, tests fixes
* fixes
* nits
* add inference support; add auto-mask token support
* nits
* nits
* progress
* simplify logging
* lint
* prefix args with diffusion_
* coderabbito
* tests fix
* nit
* nits
* cleanup + nits
* nits
* fix SFT sample gen
* fixes
* fix
* comments
* comments
* lint
* reward model lora fix
* cleanup; fix pretraining_dataset case
* gradio inference
* update cfgs
* update cfgs
* train, generation parity, cleanup
* fix
* simplify
* test
* test fix
2025-09-10 20:27:00 -04:00
Dan Saunders
79ddaebe9a
Add ruff, remove black, isort, flake8, pylint ( #3092 )
...
* black, isort, flake8 -> ruff
* remove unused
* add back needed import
* fix
2025-08-23 23:37:33 -04:00
Dan Saunders
1d91d905c9
remove deprecated wandb env var ( #2751 )
...
* remove deprecated wandb env var
* remove os.environ wandb setting; unused loggers
* remove os.environ wandb setting; unused loggers
2025-06-03 14:04:15 -07:00
salman
65c5481120
Rank 0-only logging ( #2608 )
...
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-28 14:57:30 +01:00
Dan Saunders
c907ac173e
adding pre-commit auto-update GH action and bumping plugin versions ( #2428 )
...
* adding pre-commit auto-update GH action and bumping plugin versions
* running updated pre-commit plugins
* sorry to revert, but pylint complained
* Update .pre-commit-config.yaml
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-03-21 11:02:43 -04:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
Wing Lian
02ce520b7e
upgrade liger to 0.4.0 ( #1973 )
...
* upgrade liger to 0.3.1
* update docs and example
* skip duplicate code check
* Update src/axolotl/integrations/liger/args.py
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* Update README.md
Co-authored-by: NanoCode012 <nano@axolotl.ai >
* add logging
* chore: lint
* add test case
* upgrade liger and transformers
* also upgrade accelerate
* use kwargs to support patch release
* make sure prepared path is empty for test
* use transfromers 4.46.1 since 4.46.2 breaks fsdp
---------
Co-authored-by: NanoCode012 <nano@axolotl.ai >
2024-11-07 12:53:34 -05:00