* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* feat: add fsdp config for magistral
* fix: add mllama self attention handling for lora kernels
* fix: no eval if val_set_size 0 despite having test_datasets
* fix: add note for cce for vlm in newer model