* checkpoint model on first step callback
* remove debug
* add test cases; update existing tests not to save on first step
* move test out of solo
* delete
* default to False
* typo
* bump transformers and set roundup_power2_divisions for more VRAM improvements
* support for low bit optimizers from torch ao
* fix check for alternate optimizers and use nous models on hf for llama3
* add missing check for ao_adamw_fp8
* fix check when using custom optimizers w adamw