axolotl

tocmo0nlord/axolotl

Fork 0

Commit Graph

Author	SHA1	Message	Date
Wing Lian	9871fa060b	optim e2e tests to run a bit faster (#2069 ) [skip ci] * optim e2e tests to run a bit faster * run prequant w/o lora_modules_to_save * use smollm2	2024-11-18 12:35:31 -05:00
Wing Lian	98af5388ba	bump flash attention 2.5.8 -> 2.6.1 (#1738 ) * bump flash attention 2.5.8 -> 2.6.1 * use triton implementation of cross entropy from flash attn * add smoke test for flash attn cross entropy patch * fix args to xentropy.apply * handle tuple from triton loss fn * ensure the patch tests run independently * use the wrapper already built into flash attn for cross entropy * mark pytest as forked for patches * use pytest xdist instead of forked, since cuda doesn't like forking * limit to 1 process and use dist loadfile for pytest * change up pytest for fixture to reload transformers w monkeypathc	2024-07-14 19:11:31 -04:00

Author

SHA1

Message

Date

Wing Lian

9871fa060b

optim e2e tests to run a bit faster (#2069 ) [skip ci]

* optim e2e tests to run a bit faster

* run prequant w/o lora_modules_to_save

* use smollm2

2024-11-18 12:35:31 -05:00

Wing Lian

98af5388ba

bump flash attention 2.5.8 -> 2.6.1 (#1738 )

* bump flash attention 2.5.8 -> 2.6.1

* use triton implementation of cross entropy from flash attn

* add smoke test for flash attn cross entropy patch

* fix args to xentropy.apply

* handle tuple from triton loss fn

* ensure the patch tests run independently

* use the wrapper already built into flash attn for cross entropy

* mark pytest as forked for patches

* use pytest xdist instead of forked, since cuda doesn't like forking

* limit to 1 process and use dist loadfile for pytest

* change up pytest for fixture to reload transformers w monkeypathc

2024-07-14 19:11:31 -04:00

2 Commits