upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459)

* upgrade transformers==5.3.0 trl==0.29.0 kernels

* use latest deepspeed fixes

* use corect image for cleanup

* fix test outputs for tokenizer fixes upstream

* fix import:

* keep trl at 0.28.0

* handle updated API

* use latest trl since 0.28.0 doesn't work with latest transformers

* use trl experimental for pad to length

* monkeypatch trl with ORPOTrainer so liger doesn't croak

* upgrade accelerate

* more fixes

* move patch for orpotrainer

* load the imports later

* remove use_logits_to_keep

* fix loss_type arg as a list

* fetch hf cache from s3

* just manually download the missing model for now

* lint for pre-commit update

* a few more missing models on disk

* fix: loss_type internally now list

* fix: remove deprecated code and raise deprecate

* fix: remove unneeded blocklist

* fix: remove reliance on transformers api to find package available

* chore: refactor shim for less sideeffect

* fix: silent trl experimental warning

---------

Co-authored-by: NanoCode012 <nano@axolotl.ai>
This commit is contained in:
Wing Lian
2026-03-06 09:11:20 -05:00
committed by GitHub
parent 56162f71db
commit cada93cee5
19 changed files with 81 additions and 49 deletions

View File

@@ -52,8 +52,8 @@ def mock_torch():
mock_torch.cuda.device_count.return_value = 2
# Mock memory allocated per device (1GB for device 0, 2GB for device 1)
mock_torch.cuda.memory_allocated.side_effect = (
lambda device: (device + 1) * 1024 * 1024 * 1024
mock_torch.cuda.memory_allocated.side_effect = lambda device: (
(device + 1) * 1024 * 1024 * 1024
)
yield mock_torch
@@ -292,8 +292,8 @@ class TestRuntimeMetricsTracker:
mock_memory_info = mock_process.memory_info.return_value
mock_memory_info.rss = 0.5 * 1024 * 1024 * 1024 # 0.5GB
mock_torch.cuda.memory_allocated.side_effect = (
lambda device: (device + 0.5) * 1024 * 1024 * 1024
mock_torch.cuda.memory_allocated.side_effect = lambda device: (
(device + 0.5) * 1024 * 1024 * 1024
)
# Update memory metrics again
@@ -307,8 +307,8 @@ class TestRuntimeMetricsTracker:
# Change mocked memory values to be higher
mock_memory_info.rss = 2 * 1024 * 1024 * 1024 # 2GB
mock_torch.cuda.memory_allocated.side_effect = (
lambda device: (device + 2) * 1024 * 1024 * 1024
mock_torch.cuda.memory_allocated.side_effect = lambda device: (
(device + 2) * 1024 * 1024 * 1024
)
# Update memory metrics again