update trl to 0.17.0 (#2560)

* update trl to 0.17.0

* grpo + vllm no longer supported with 2.5.1 due to vllm constraints

* disable VLLM_USE_V1 for ci

* imporve handle killing off of multiprocessing vllm service

* debug why this doesn't run in CI

* increase vllm wait time

* increase timeout to 5min

* upgrade to vllm 0.8.4

* dump out the vllm log for debugging

* use debug logging

* increase vllm start timeout

* use NVL instead

* disable torch compile cache

* revert some commented checks now that grpo tests are fixed

* increase vllm timeoout back to 5min
This commit is contained in:
Wing Lian
2025-04-27 19:19:53 -04:00
committed by GitHub
parent f9c7c3bb72
commit dc4da4a7e2
7 changed files with 93 additions and 30 deletions

View File

@@ -43,7 +43,7 @@ jobs:
cuda_version: 12.4.1
python_version: "3.11"
pytorch: 2.5.1
axolotl_extras: vllm
axolotl_extras:
num_gpus: 2
nightly_build: "true"
- cuda: 126