* feat: add fsdp config for magistral * fix: add mllama self attention handling for lora kernels * fix: no eval if val_set_size 0 despite having test_datasets * fix: add note for cce for vlm in newer model