* support seperate lr for embeddings, similar to loraplus
* add test case for train w lr embedding scale
* use kwarg for optimizer
* make sure to handle the optimizer creation
* make sure to handle for embedding_lr too
* use smollm for e2e, check for embeddings lr first before wdecay