Updates for trl 0.16.0 - mostly for GRPO (#2437) [skip ci] · b6fc46ada8 - axolotl

Updates for trl 0.16.0 - mostly for GRPO (#2437) [skip ci]

* add grpo scale_rewards config for trl#3135

* options to connect to vllm server directly w grpo trl#3094

* temperature support trl#3029

* sampling/generation kwargs for grpo trl#2989

* make vllm_enable_prefix_caching a config param trl#2900

* grpo multi-step optimizeations trl#2899

* remove overrides for grpo trainer

* bump trl to 0.16.0

* add cli  to start vllm-serve via trl

* call the python module directly

* update to use vllm with 2.6.0 too now and call trl vllm serve from module

* vllm 0.8.1

* use python3

* use sys.executable

* remove context and wait for start

* fixes to make it actually work

* fixes so the grpo tests pass with new vllm paradigm

* explicit host/port and check in start vllm

* make sure that vllm doesn't hang by setting quiet so outouts go to dev null

* also bump bnb to latest release

* add option for wait from cli and nccl debugging for ci

* grpo + vllm test on separate devices for now

* make sure grpo + vllm tests runs single worker since pynccl comms would conflict

* fix cli

* remove wait and add caching for argilla dataset

* refactoring configs

* chore: lint

* add vllm config

* fixup vllm grpo args

* fix one more incorrect schema/config path

* fix another vlllm reference and increase timeout

* make the tests run a bit faster

* change mbsz back so it is correct for grpo

* another change mbsz back so it is correct for grpo

* fixing cli args

* nits

* adding docs

* docs

* include tensor parallel size for vllm in pydantic schema

* moving start_vllm, more docs

* limit output len for grpo vllm

* vllm enable_prefix_caching isn't a bool cli arg

* fix env ordering in tests and also use pid check when looking for vllm

---------

Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>

This commit is contained in:

Wing Lian

2025-03-31 15:47:11 -04:00

committed by

GitHub

parent b35992262e

commit b6fc46ada8

24 changed files with 703 additions and 349 deletions

									
										1

_quarto.yml
									
												View File
												
				@@ -40,6 +40,7 @@ quartodoc:

				        - cli.preprocess

				        - cli.sweeps

				        - cli.utils

				        - cli.vllm_serve

				        - cli.cloud.base

				        - cli.cloud.modal_

				    - title: Trainers

Updates for trl 0.16.0 - mostly for GRPO (#2437) [skip ci]

1 _quarto.yml Unescape Escape View File

1

_quarto.yml

View File