* use warmup_ratio as a better default than warmup steps since it's data dependent * replace remainder of warmup_steps
Qwen
TODO
Qwen2 MoE
✅ multipack ✅ qwen2_moe 4-bit QLoRA ✅ qwen2_moe 16-bit LoRA ❓ qwen2_moe 8-bit LoRA
* use warmup_ratio as a better default than warmup steps since it's data dependent * replace remainder of warmup_steps
TODO
✅ multipack ✅ qwen2_moe 4-bit QLoRA ✅ qwen2_moe 16-bit LoRA ❓ qwen2_moe 8-bit LoRA