Commit Graph

3 Commits

Author SHA1 Message Date
Wing Lian
cada93cee5 upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459)
* upgrade transformers==5.3.0 trl==0.29.0 kernels

* use latest deepspeed fixes

* use corect image for cleanup

* fix test outputs for tokenizer fixes upstream

* fix import:

* keep trl at 0.28.0

* handle updated API

* use latest trl since 0.28.0 doesn't work with latest transformers

* use trl experimental for pad to length

* monkeypatch trl with ORPOTrainer so liger doesn't croak

* upgrade accelerate

* more fixes

* move patch for orpotrainer

* load the imports later

* remove use_logits_to_keep

* fix loss_type arg as a list

* fetch hf cache from s3

* just manually download the missing model for now

* lint for pre-commit update

* a few more missing models on disk

* fix: loss_type internally now list

* fix: remove deprecated code and raise deprecate

* fix: remove unneeded blocklist

* fix: remove reliance on transformers api to find package available

* chore: refactor shim for less sideeffect

* fix: silent trl experimental warning

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
2026-03-06 09:11:20 -05:00
NanoCode012
11eb36585a feat: add arg to enable dft in liger (#3125)
* feat: add arg to enable dft in liger

* feat: add tests use_token_scaling

* fix: test

* fix: move check to args
2025-11-10 21:37:47 +07:00
Wing Lian
2c408b5c5e Apply generic fused liger ce, cce, and tiledmlp for arbitrary models (#2908)
* Apply generic fused liger ce for unknown models

* fix deepseek liger modeling

* generic cce and config tiled mlp to use original mlp and auto detect compute params

* fix weight and lint

* update warnings

* address PR feedback

* use lookup for model class prefixes

* revert inadvertent change to flash attn verison

* remove un-needed pylint annotations

* fix import
2025-07-15 22:40:41 -04:00