bump transformers to 5.5.4 and trl to latest 1.1.0 (#3603)

* bump transformers to 5.5.4 and trl to latest 1.1.0 * more upgrades * update peft too * adapt lora_merge to peft 0.19 layer config API PEFT 0.19 requires a LoraConfig object on Linear/ParamWrapper/Conv layer constructors and moved use_rslora, use_dora, fan_in_fan_out, lora_dropout, and lora_bias into that config. Build the config per branch in _build_peft_layer_and_get_delta so the merge utility works with the upgraded peft. * allow lora_dropout on mixed attention+MoE configs under peft 0.19 PEFT 0.19's convert_peft_config_for_transformers auto-remaps old MoE target_modules (w1/w2/w3 on Mixtral, etc.) into target_parameters for transformers v5's fused 3D expert Parameters. Those targets get wrapped with ParamWrapper, which rejects lora_dropout != 0 because the 3D einsum can't factor dropout out of lora_B(lora_A(dropout(x))). Monkeypatch ParamWrapper.__init__ to internally use a copy of the LoraConfig with lora_dropout=0, so its dropout slot becomes nn.Identity while the shared config still delivers real dropout to sibling Linear LoRA layers (attention q/k/v/o). A probe runs the same conversion on a deep copy to detect the situation and emit a warning before patching.
2026-04-15 09:27:03 -04:00
parent 6990478163
commit 323da791eb
3 changed files with 144 additions and 13 deletions
--- a/requirements.txt
+++ b/requirements.txt
@@ -10,15 +10,15 @@ liger-kernel==0.7.0

 packaging==26.0
 huggingface_hub>=1.1.7
-peft>=0.18.1
+peft>=0.19.0,<0.20.0
 tokenizers>=0.22.1
-transformers==5.5.3
+transformers==5.5.4
 accelerate==1.13.0
-datasets==4.5.0
+datasets>=4.8.4,<4.9.0
 deepspeed>=0.18.6,<0.19.0
-trl==0.29.0
-hf_xet==1.3.2
-kernels==0.12.2
+trl==1.1.0
+hf_xet==1.4.3
+kernels==0.13.0

 fla-core==0.4.1
 flash-linear-attention==0.4.1