* update trl to 0.17.0
* grpo + vllm no longer supported with 2.5.1 due to vllm constraints
* disable VLLM_USE_V1 for ci
* imporve handle killing off of multiprocessing vllm service
* debug why this doesn't run in CI
* increase vllm wait time
* increase timeout to 5min
* upgrade to vllm 0.8.4
* dump out the vllm log for debugging
* use debug logging
* increase vllm start timeout
* use NVL instead
* disable torch compile cache
* revert some commented checks now that grpo tests are fixed
* increase vllm timeoout back to 5min
* fixes for delinearization, and make qlora work with fsdp2
* Add back mistakenly removed lm_eval
* typo [skip ci]
* patch evals for torch.compile + fsdp2
* also check torch_compile w fsdp2
* lots of fixes for flex attn with llama4
* fix patch check and patch llama4 too
* attempt to make the patches stick
* use transformers 4.51.2
* update configs and README for llama4
* remove torch.compile for CI test
* cleanup any existing singletons
* set singleton cache to None instead of deleting
* use importlib reload with monkeypatch
* don't worry about transformers version, mark inputs with grads, fix regex
* make sure embeds aren't on cpu
* logging and mem improvements
* vllm version and add to docker, make sure to save processor on conversion
* fix ambiguous tensor bool check
* fix vllm to not use v1, upgrade hf transformers
* fix tests
* make flex_attn_compile_kwargs configurable, since this depends on model params
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>
* [ci] make e2e tests a bit faster by reducing test split size
* use 10% split of alpaca dataset to speed up dataset loading/tokenization
* reduce gas 4->2 for most e2e tests
* increase val set size for packing
* add grpo scale_rewards config for trl#3135
* options to connect to vllm server directly w grpo trl#3094
* temperature support trl#3029
* sampling/generation kwargs for grpo trl#2989
* make vllm_enable_prefix_caching a config param trl#2900
* grpo multi-step optimizeations trl#2899
* remove overrides for grpo trainer
* bump trl to 0.16.0
* add cli to start vllm-serve via trl
* call the python module directly
* update to use vllm with 2.6.0 too now and call trl vllm serve from module
* vllm 0.8.1
* use python3
* use sys.executable
* remove context and wait for start
* fixes to make it actually work
* fixes so the grpo tests pass with new vllm paradigm
* explicit host/port and check in start vllm
* make sure that vllm doesn't hang by setting quiet so outouts go to dev null
* also bump bnb to latest release
* add option for wait from cli and nccl debugging for ci
* grpo + vllm test on separate devices for now
* make sure grpo + vllm tests runs single worker since pynccl comms would conflict
* fix cli
* remove wait and add caching for argilla dataset
* refactoring configs
* chore: lint
* add vllm config
* fixup vllm grpo args
* fix one more incorrect schema/config path
* fix another vlllm reference and increase timeout
* make the tests run a bit faster
* change mbsz back so it is correct for grpo
* another change mbsz back so it is correct for grpo
* fixing cli args
* nits
* adding docs
* docs
* include tensor parallel size for vllm in pydantic schema
* moving start_vllm, more docs
* limit output len for grpo vllm
* vllm enable_prefix_caching isn't a bool cli arg
* fix env ordering in tests and also use pid check when looking for vllm
---------
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>