Commit Graph

19 Commits

Author SHA1 Message Date
Casper
a045db0214 Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732)
* Implement Mistral FA + SWA + Sample Packing

* Handle unbroadcastable tensor

* chore: lint

* Simplify _prepare_decoder_attention_mask

* Uncomment window size

* Upgrade flash-attn to minimum of 2.3.0 to support SWA

* Add original condition to avoid error during inference

* chore: lint

* use torchscript to prevent oom

* chore: pylint

---------

Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-10-16 15:13:46 -04:00
Wing Lian
7f2027d93f tweak for xformers install w pytorch 2.1.0 (#727) 2023-10-13 15:21:17 -04:00
Wing Lian
8d288a2ad4 workaround for installing xformers w torch 2.1.0 (#725) 2023-10-13 11:19:30 -04:00
Wing Lian
c25ba7939b update README w deepspeed info (#605) 2023-09-22 00:15:52 -04:00
Wing Lian
772cd870d4 fix the sed command to replace the version w the tag
Some checks failed
pre-commit / pre-commit (push) Has been cancelled
publish pypi / Upload release to PyPI (push) Has been cancelled
PyTest / test (3.10) (push) Has been cancelled
PyTest / test (3.9) (push) Has been cancelled
2023-09-11 13:44:19 -04:00
Wing Lian
6c5fbe6223 add long_description for pypi push (#555) 2023-09-11 13:34:29 -04:00
Wing Lian
34c0a86a11 update readme to point to direct link to runpod template, cleanup install instrucitons (#532)
* update readme to point to direct link to runpod template, cleanup install instrucitons

* default install flash-attn and auto-gptq now too

* update readme w flash-attn extra

* fix version in setup
2023-09-08 11:58:54 -04:00
Wing Lian
3355706e22 Add support for GPTQ using native transformers/peft (#468)
* auto gptq support

* more tweaks and add yml

* remove old gptq docker

* don't need explicit peft install for tests

* fix setup.py to use extra index url

install torch for tests
fix cuda version for autogptq index
set torch in requirements so that it installs properly
move gptq install around to work with github cicd

* gptq doesn't play well with sample packing

* address pr feedback

* remove torch install for now

* set quantization_config from model config

* Fix the implementation for getting quant config from model config
2023-09-05 12:43:22 -04:00
Wing Lian
96deb6bd67 recast loralayer, norm, lmhead + embed token weights per original qlora (#393)
* recast loralayer, norm, lmhead + embed token weights per original qlora

* try again for the fix

* refactor torch dtype picking

* linter fixes

* missing import for LoraLayer

* fix install for tests now that peft is involved
2023-08-21 18:41:12 -04:00
mhenrichsen
cf6654769a flash attn pip install (#426)
* flash attn pip

* add packaging

* add packaging to apt get

* install flash attn in dockerfile

* remove unused whls

* add wheel

* clean up pr

fix packaging requirement for ci
upgrade pip for ci
skip build isolation for requiremnents to get flash-attn working
install flash-attn seperately

* install wheel for ci

* no flash-attn for basic cicd

* install flash-attn as pip extras

---------

Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net>
Co-authored-by: mhenrichsen <some_email@hey.com>
Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2023-08-18 19:00:27 -04:00
Wing Lian
bbc5bc5791 Merge pull request #108 from OpenAccess-AI-Collective/docker-gptq
default to qlora support, make gptq specific image
2023-05-30 15:07:04 -04:00
NanoCode012
37293dce07 Apply isort then black 2023-05-31 02:53:53 +09:00
NanoCode012
8b617cc7f6 Lint setup.py 2023-05-31 02:53:53 +09:00
Wing Lian
6ef96f569b default to qlora support, make gptq specific image 2023-05-29 20:34:41 -04:00
Wing Lian
2bc1a5bde1 black formatting 2023-05-10 16:01:08 -04:00
Wing Lian
990bec63e6 docker layer caching, build w axolotl from base build 2023-05-07 17:16:05 -04:00
Wing Lian
f50de1b1cb handle empty lines 2023-04-19 08:03:34 -04:00
Wing Lian
4131183115 fix install to work with latest alpaca lora 4bit 2023-04-17 12:45:12 -04:00
Wing Lian
77fca25f1b 4bit quantized support (wip) 2023-04-17 11:37:39 -04:00