feat: add sageattention (#2823) [skip ci]
* feat: add sageattention * feat: call path on pre model load * fix: patch to use register to correct var * fix: add strict check import at start * chore: fix comments * chore: refactor * feat: add capability check * fix: missed underscore * fix: let sageattention use FA backend in transformers * feat: update sage attention for attention mask and position ids * feat: allow sample packing but add warning without packing * fix: loss hitting 0 with packing and attention mask note * feat: downcast embeds if sage attention too * feat: add config validation * feat: add attention docs * chore: docs
This commit is contained in:
@@ -320,6 +320,7 @@ website:
|
||||
- docs/multipack.qmd
|
||||
- docs/mixed_precision.qmd
|
||||
- docs/optimizers.qmd
|
||||
- docs/attention.qmd
|
||||
|
||||
- section: "Advanced Features"
|
||||
contents:
|
||||
|
||||
Reference in New Issue
Block a user