Llama4 linearized (#2502)

* llama4 support for linearized experts

* clean up fsdp2 sharding to prevent hang

* add yaml config

* cleanup example [skip ci]
This commit is contained in:
Wing Lian
2025-04-07 20:47:00 -04:00
committed by GitHub
parent a6c03217f5
commit 0dac2ddeac
10 changed files with 384 additions and 63 deletions

View File

@@ -4,3 +4,5 @@ mypy
types-requests
quartodoc
jupyter
blobfile
tiktoken