* add support for optimi_adamw optimizer w kahan summation
* pydantic validator for optimi_adamw
* workaround for setting optimizer for fsdp
* make sure to install optimizer packages
* make sure to have parity for model parameters passed to optimizer
* add smoke test for optimi_adamw optimizer
* don't use foreach optimi by default