* update trl to 0.17.0
* grpo + vllm no longer supported with 2.5.1 due to vllm constraints
* disable VLLM_USE_V1 for ci
* imporve handle killing off of multiprocessing vllm service
* debug why this doesn't run in CI
* increase vllm wait time
* increase timeout to 5min
* upgrade to vllm 0.8.4
* dump out the vllm log for debugging
* use debug logging
* increase vllm start timeout
* use NVL instead
* disable torch compile cache
* revert some commented checks now that grpo tests are fixed
* increase vllm timeoout back to 5min
* add e2e smoke test for using activation/gradient checkpointing with offload
* disable duplicate code check for the test
* fix relative import
* seq len too small to test this dataset with packing
* Fix checkpoint ptaching for tests
* make sure to validate the config before normalizing so defaults get set
* validation not needed for particular test
* remove duplicate validations
* set qlora correctly
* fix: mention to install pytorch before axolotl
* feat(doc): include instruction to delinearize
* fix: update instruction for delinearize with adapter
* builds for torch==2.7.0
* use xformers==0.0.29.post3
* no vllm support with torch 2.7
* update default, fix conditional
* no xformers for 270
* no vllm on 2.7.0 for multigpu test too
* remove deprecated verbose arg from scheduler
* 2.7.0 tests on cpu
* batch api HF adapter for ring-flash-attn; cleanup and improvements
* update
* adding all batch ring-flash-attn methods via single adapter
* removing pad_to_sequence_len=False for now
* fix
* updating docs to include batch SP
* review comments
* fixes for batch API funcs, simplify
* fixes
* fix
* updates
* add batch_zigzag smoke test
* fixes for delinearization, and make qlora work with fsdp2
* Add back mistakenly removed lm_eval
* typo [skip ci]
* patch evals for torch.compile + fsdp2
* also check torch_compile w fsdp2
* lots of fixes for flex attn with llama4
* fix patch check and patch llama4 too
* attempt to make the patches stick
* use transformers 4.51.2
* update configs and README for llama4
* remove torch.compile for CI test
* cleanup any existing singletons
* set singleton cache to None instead of deleting
* use importlib reload with monkeypatch
* don't worry about transformers version, mark inputs with grads, fix regex
* make sure embeds aren't on cpu
* logging and mem improvements
* vllm version and add to docker, make sure to save processor on conversion
* fix ambiguous tensor bool check
* fix vllm to not use v1, upgrade hf transformers
* fix tests
* make flex_attn_compile_kwargs configurable, since this depends on model params
---------
Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Salman Mohammadi <salman.mohammadi@outlook.com>
* [ci] make e2e tests a bit faster by reducing test split size
* use 10% split of alpaca dataset to speed up dataset loading/tokenization
* reduce gas 4->2 for most e2e tests
* increase val set size for packing