axolotl/setup.py at 1a6309c8a633a6fe17b2ffebbbc0353565f376e5

Files

Casper a045db0214 Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732 )

* Implement Mistral FA + SWA + Sample Packing

* Handle unbroadcastable tensor

* chore: lint

* Simplify _prepare_decoder_attention_mask

* Uncomment window size

* Upgrade flash-attn to minimum of 2.3.0 to support SWA

* Add original condition to avoid error during inference

* chore: lint

* use torchscript to prevent oom

* chore: pylint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>

2023-10-16 15:13:46 -04:00

1.7 KiB

Raw Blame History

View Raw

1.7 KiB Raw Blame History

1.7 KiB

Raw Blame History