* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* phi-3 support and perplexity metric
* phi-3 chat template
* metrics updates
* chore: lint
* fix assertion on Tensor
* fix tests since tokenization happens in the metric
* fix perplexity value of shorter passage
---------
Co-authored-by: Wing Lian <wing.lian@gmail.com>