Wing Lian
0dac2ddeac
Llama4 linearized (#2502)
* llama4 support for linearized experts
* clean up fsdp2 sharding to prevent hang
* add yaml config
* cleanup example [skip ci]
2025-04-07 20:47:00 -04:00
..
2025-04-04 13:47:26 -04:00
2025-01-30 11:34:02 -05:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2024-12-10 16:25:25 -05:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-07 20:47:00 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00
2025-04-04 13:47:26 -04:00