* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests
* log slowest tests
* pin pynvml==11.5.3
* fix load local hub path
* optimize for speed w smaller models and val_set_size
* replace pynvml
* make the resume from checkpoint e2e faster
* make tests smaller
* wip add new proposed message structure
* tokenization
* wip
* wip transform builder
* wip make the chat dataset loadable
* wip chatml + llama 3 new chat objects
* chore: lint
* chore: lint
* fix tokenization
* remove dacite dependency since we're using pydantic now
* fix handling when already correctly split in messages
* make sure to remove chat features from tokenized ds
* move chat to be a input transform for messages
* make sure llama3 has the bos token
* remove non-working special token code
* fix messages strat loader
* WIP use trl ORPOTrainer
* fixes to make orpo work with trl
* fix the chat template laoding
* make sure to handle the special tokens and add_generation for assistant turn too
* ipo-dpo trainer
* fix missing abstract method
* chatml template, grad checkpointing kwargs support
* fix steps calc for RL and add dataloader kwargs
* wip to fix dpo and start ppo
* more fixes
* refactor to generalize map fn
* fix dataset loop and handle argilla pref dataset
* set training args
* load reference model on seperate gpu if more than one device
* no auto upload to hub for dpo, don't add lora adapters to ref model for dpo
* fixes for rl training
* support for ipo from yaml
* set dpo training args from the config, add tests
* chore: lint
* set sequence_len for model in test
* add RLHF docs