Commit Graph

69 Commits

Author SHA1 Message Date
Wing Lian
87e073d0de fix lora target module, require explicit flash attention, fix min logging steps, don't use adam8bit for int4, hash prepared datasets, support hf hub datasets 2023-04-17 18:01:12 -04:00
Wing Lian
77fca25f1b 4bit quantized support (wip) 2023-04-17 11:37:39 -04:00
Wing Lian
12de7b7cf7 cleanup, prep for 4bit quant support 2023-04-16 11:06:41 -04:00
Wing Lian
d1aed4c8e5 deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches 2023-04-16 06:59:47 -04:00
Wing Lian
a4593832a9 fix logging 2023-04-15 23:12:48 -04:00
Wing Lian
23938015c8 prepare datasets only flag 2023-04-15 16:30:55 -04:00
Wing Lian
d33a975747 configure log level, add llama 7b config 2023-04-15 14:24:37 -04:00
Wing Lian
05fffb53b4 more logging, wandb fixes 2023-04-15 13:37:17 -04:00
Wing Lian
2df63ef815 refactor trainer setup to account for deepspeed integration 2023-04-15 12:16:42 -04:00
Wing Lian
b164725417 improve prepared dataset loading, fix inference 2023-04-15 12:14:52 -04:00
Wing Lian
937f44f021 helpful info output 2023-04-15 00:03:43 -04:00
Wing Lian
902dd0ab47 fix issue with completed model being empty
see https://github.com/huggingface/peft/issues/286#issuecomment-1501617281
2023-04-14 23:57:55 -04:00
Wing Lian
80b2ed29d8 various bugfixes 2023-04-14 21:37:07 -04:00
Wing Lian
45f77dd51e bettter handling of llama model import 2023-04-14 19:30:41 -04:00
Wing Lian
949a27be21 more fixes and prep for llama training 2023-04-14 18:30:09 -04:00
Wing Lian
f2a2029d0d config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes 2023-04-14 12:18:56 -04:00
Wing Lian
a6028d302e black formatting 2023-04-14 07:25:52 -04:00
Wing Lian
8d959a7e26 make it work with pythia in the cloud 2023-04-14 07:24:55 -04:00
Wing Lian
ce24f5e246 WIP for axolotl trainer 2023-04-14 00:20:05 -04:00