* remove the bos token from dpo outputs * don't forget to fix prompt_input_ids too * use processing_class instead of tokenizer * fix for processing class
* ensure that the bias is also in the correct dtype * add nightly for dpo-qlora-fsdp