Commit Graph

13 Commits

Author SHA1 Message Date
Wing Lian
cada93cee5 upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459)
* upgrade transformers==5.3.0 trl==0.29.0 kernels

* use latest deepspeed fixes

* use corect image for cleanup

* fix test outputs for tokenizer fixes upstream

* fix import:

* keep trl at 0.28.0

* handle updated API

* use latest trl since 0.28.0 doesn't work with latest transformers

* use trl experimental for pad to length

* monkeypatch trl with ORPOTrainer so liger doesn't croak

* upgrade accelerate

* more fixes

* move patch for orpotrainer

* load the imports later

* remove use_logits_to_keep

* fix loss_type arg as a list

* fetch hf cache from s3

* just manually download the missing model for now

* lint for pre-commit update

* a few more missing models on disk

* fix: loss_type internally now list

* fix: remove deprecated code and raise deprecate

* fix: remove unneeded blocklist

* fix: remove reliance on transformers api to find package available

* chore: refactor shim for less sideeffect

* fix: silent trl experimental warning

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-06 09:11:20 -05:00
NanoCode012
6a8baf8fa7 feat: add sonicmoe (#3411)
* feat: add sonicmoe

* feat: add torch compile for routing

* feat: add routing smoke test

* feat: add qwen3_5_moe, qwen3_vl_moe, qwen3_omni_moe

* fix: disable mlp kernel for sonicmoe too

* feat: update to sonicmoe release

* chore: update import following new sonicmoe changes

* feat: update handling for blackwell

* feat: add sonicmoe e2e test

* fix: installation for updated sonicmoe

* fix: git commit

* fix: ignore py req and fix metadata

* fix: increase min hidden size to match sonicmoe kernel min

* fix: attempt properly interleave and handle unpatch mid-test

* chore: refactor teardown better

* chore: refactor to re-use rearrange

* fix: add idempotency guard

* fix: address comments on CI memory and interleave

* fix: tests grad, param doublewrapped
2026-03-05 13:43:31 -05:00
Wing Lian
68f1b7004c ScatterMoE LoRA support (#3410)
* scattermoe lora support

* fsdp, bf16, dim fixes

* expert weights aren't needed in save for bwd since they are frozen

* use sonicmoe optim options

* update save model from upstream

* fixes per code review feedback and add tests

* revert removal of CP fix

* misc fixes
2026-02-24 14:59:55 -05:00
PraMamba
8aab807e67 feat: Add SwanLab integration for experiment tracking (#3334)
* feat(swanlab): add SwanLab integration for experiment tracking

SwanLab integration provides comprehensive experiment tracking and monitoring for Axolotl training.

Features:
- Hyperparameter logging
- Training metrics tracking
- RLHF completion logging
- Performance profiling
- Configuration validation and conflict detection

Includes:
- Plugin in src/axolotl/integrations/swanlab/
- Callback in src/axolotl/utils/callbacks/swanlab.py
- Tests in tests/integrations/test_swanlab.py
- Examples in examples/swanlab/

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix(swanlab): address PR #3334 review feedback from winglian and CodeRabbit

- Change use_swanlab default to True (winglian)
- Clear buffer after periodic logging to prevent duplicates (CodeRabbit Major)
- Add safe exception handling in config fallback (CodeRabbit)
- Use context managers for file operations (CodeRabbit)
- Replace LOG.error with LOG.exception for better debugging (CodeRabbit)
- Sort __all__ alphabetically (CodeRabbit)
- Add language specifiers to README code blocks (CodeRabbit)
- Fix end-of-file newline in README (pre-commit)

Resolves actionable comments and nitpicks from CodeRabbit review.
Addresses reviewer feedback from @winglian.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* only run swanlab integration tests if package is available

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2026-01-06 09:19:18 -05:00
Seung Hyun Cho
3e51a680c2 fix: Fix evaluation loss in KD trainer (#3271)
* fix: Fix evaluation loss in KD trainer

* Fix v2 strategy super() call

* fix: Add safety check for total_tokens in log method

* fix: simplified num items and outputs return handling

* fix: add missing model forward pass in compute_loss

* refactor: Use Template Method pattern for chat template strategies

* refactor: use pop(None) and remove v2 override

* chore: lint

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-12-17 13:40:36 -05:00
NanoCode012
11eb36585a feat: add arg to enable dft in liger (#3125)
* feat: add arg to enable dft in liger

* feat: add tests use_token_scaling

* fix: test

* fix: move check to args
2025-11-10 21:37:47 +07:00
Dan Saunders
1b53c49e1a text diffusion training plugin (#3067)
* diffusion training plugin

* cleanup

* nits

* fixes + improvements

* add back in reinit_weights (clobbered?); masking / pretrain fixes

* nits

* cleanup; tests draft

* sample generation, tests fixes

* fixes

* nits

* add inference support; add auto-mask token support

* nits

* nits

* progress

* simplify logging

* lint

* prefix args with diffusion_

* coderabbito

* tests fix

* nit

* nits

* cleanup + nits

* nits

* fix SFT sample gen

* fixes

* fix

* comments

* comments

* lint

* reward model lora fix

* cleanup; fix pretraining_dataset case

* gradio inference

* update cfgs

* update cfgs

* train, generation parity, cleanup

* fix

* simplify

* test

* test fix
2025-09-10 20:27:00 -04:00
Dan Saunders
79ddaebe9a Add ruff, remove black, isort, flake8, pylint (#3092)
* black, isort, flake8 -> ruff

* remove unused

* add back needed import

* fix
2025-08-23 23:37:33 -04:00
Dan Saunders
1d91d905c9 remove deprecated wandb env var (#2751)
* remove deprecated wandb env var

* remove os.environ wandb setting; unused loggers

* remove os.environ wandb setting; unused loggers
2025-06-03 14:04:15 -07:00
salman
65c5481120 Rank 0-only logging (#2608)
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-05-28 14:57:30 +01:00
Dan Saunders
c907ac173e adding pre-commit auto-update GH action and bumping plugin versions (#2428)
* adding pre-commit auto-update GH action and bumping plugin versions

* running updated pre-commit plugins

* sorry to revert, but pylint complained

* Update .pre-commit-config.yaml

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2025-03-21 11:02:43 -04:00
Wing Lian
fb3352e21c rename liger test so it properly runs in ci (#2246) 2025-01-09 17:31:43 -05:00
Wing Lian
02ce520b7e upgrade liger to 0.4.0 (#1973)
* upgrade liger to 0.3.1

* update docs and example

* skip duplicate code check

* Update src/axolotl/integrations/liger/args.py

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* Update README.md

Co-authored-by: NanoCode012 <nano@axolotl.ai>

* add logging

* chore: lint

* add test case

* upgrade liger and transformers

* also upgrade accelerate

* use kwargs to support patch release

* make sure prepared path is empty for test

* use transfromers 4.46.1 since 4.46.2 breaks fsdp

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2024-11-07 12:53:34 -05:00