Merge branch 'main' into grpo_liger

This commit is contained in:
Salman Mohammadi
2025-02-19 16:17:42 +00:00

View File

@@ -12,6 +12,7 @@ to leverage operator fusion and tensor re-use in order to improve speed and redu
memory usage during the forward and backward passes of these calculations.
We currently support several common model architectures, including (but not limited to):
- `llama`
- `mistral`
- `qwen2`