* bump transformers and set roundup_power2_divisions for more VRAM improvements * support for low bit optimizers from torch ao * fix check for alternate optimizers and use nous models on hf for llama3 * add missing check for ao_adamw_fp8 * fix check when using custom optimizers w adamw
Llama-3
https://llama.meta.com/llama3/
- Full Fine Tune
- Single GPU @ 48GB VRAM
- LoRA
- Single GPU @ 11GB VRAM
- QLORA+FSDP
- Dual GPU @ 21GB VRAM