* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* add example for mistral orpo
* sample_packing: false for orpo
* go to load_dataset (since load_rl_datasets require a transfom_fn, which only dpo uses currently)