* add support for optimi_adamw optimizer w kahan summation * pydantic validator for optimi_adamw * workaround for setting optimizer for fsdp * make sure to install optimizer packages * make sure to have parity for model parameters passed to optimizer * add smoke test for optimi_adamw optimizer * don't use foreach optimi by default