* fsdp2 support
* use accelerate release 1.6.0
* allow 8bit optims with fsdp2
* liger + torch compile fix
* add fsdp2 e2e tests
* use transformers commit with fsdp2 support
* skip zero3 tests for this PR for now
* fix fsdp2 config for ci
* make sure both flex and flash attn work with fsdp2, skip fix untrained tokens
* okay, actually use fdsp2...
* more fixes to flex for fsdp2
* make sure to patch all the loaded models
* additional validation for fsdp2, bump dep versions