* [llama4] fix the mm yaml, add scout single gpu yaml * add README for llama4 * rename to specify fsdp
* llama4 support for linearized experts * clean up fsdp2 sharding to prevent hang * add yaml config * cleanup example [skip ci]