axolotl

Author	SHA1	Message	Date
Wing Lian	ab4b32187d	need to update deepspeed version in extras too (#2161 ) [skip ci] * need to update deepspeed version in extras too * fix patch import * fix monkeypatch reloading in tests and deepspeed patch * remove duplicated functionality fixture * reset LlamaForCausalLM too in fixtures for cce patch * reset llama attn too * disable xformers patch for cce * skip problematic test on low usage functionality	2024-12-09 14:01:44 -05:00
Wing Lian	a1790f2652	replace tensorboard checks with helper function (#2120 ) [skip ci] * replace tensorboard checks with helper function * move helper function * use relative	2024-12-03 21:06:20 -05:00
Wing Lian	ce5bcff750	various tests fixes for flakey tests (#2110 ) * add mhenrichsen/alpaca_2k_test with revision dataset download fixture for flaky tests * log slowest tests * pin pynvml==11.5.3 * fix load local hub path * optimize for speed w smaller models and val_set_size * replace pynvml * make the resume from checkpoint e2e faster * make tests smaller	2024-12-02 17:28:58 -05:00
Wing Lian	5f1d98e8fc	add e2e tests for Unsloth qlora and test the builds (#2093 ) * see if unsloth installs cleanly in ci * check unsloth install on regular tests, not sdist * fix ampere check exception for ci * use cached_property instead * add an e2e test for unsloth qlora * reduce seq len and mbsz to prevent oom in ci * add checks for fp16 and sdp_attention * pin unsloth to a specific release * add unsloth to docker image too * fix flash attn xentropy patch * fix loss, add check for loss when using fa_xentropy * fix special tokens for test * typo * test fa xentropy with and without gradient accum * pr feedback changes	2024-11-29 20:38:49 -05:00
Wing Lian	9871fa060b	optim e2e tests to run a bit faster (#2069 ) [skip ci] * optim e2e tests to run a bit faster * run prequant w/o lora_modules_to_save * use smollm2	2024-11-18 12:35:31 -05:00
Wing Lian	98af5388ba	bump flash attention 2.5.8 -> 2.6.1 (#1738 ) * bump flash attention 2.5.8 -> 2.6.1 * use triton implementation of cross entropy from flash attn * add smoke test for flash attn cross entropy patch * fix args to xentropy.apply * handle tuple from triton loss fn * ensure the patch tests run independently * use the wrapper already built into flash attn for cross entropy * mark pytest as forked for patches * use pytest xdist instead of forked, since cuda doesn't like forking * limit to 1 process and use dist loadfile for pytest * change up pytest for fixture to reload transformers w monkeypathc	2024-07-14 19:11:31 -04:00

6 Commits