* add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests
* log slowest tests
* pin pynvml==11.5.3
* fix load local hub path
* optimize for speed w smaller models and val_set_size
* replace pynvml
* make the resume from checkpoint e2e faster
* make tests smaller
* Attempt to run multigpu in PR CI for now to ensure it works
* fix yaml file
* forgot to include multigpu tests
* fix call to cicd.multigpu
* dump dictdefault to dict for yaml conversion
* use to_dict instead of casting
* 16bit-lora w flash attention, 8bit lora seems problematic
* add llama fsdp test
* more tests
* Add test for qlora + fsdp with prequant
* limit accelerate to 2 processes and disable broken qlora+fsdp+bnb test
* move multigpu tests to biweekly
* run tests again on Modal
* make sure to run the full suite of tests on modal
* run cicd steps via shell script
* run tests in different runs
* increase timeout
* split tests into steps on modal
* increase workflow timeout
* retry doing this with only a single script
* fix yml launch for modal ci
* reorder tests to run on modal
* skip dpo tests on modal
* run on L4s, A10G takes too long
* increase CPU and RAM for modal test
* run modal tests on A100s
* skip phi test on modal
* env not arg in modal dockerfile
* upgrade pydantic and fastapi for modal tests
* cleanup stray character
* use A10s instead of A100 for modal