* re-enable DPO for tests in modal ci
* workaround for training args
* don't mixin AxolotlTrainingArguments
* fix mixin order so MRO doesn't result in
TypeError: non-default argument follows default argument error
* use smaller datasets for dpo tests