axolotl

Author	SHA1	Message	Date
Wing Lian	c2a0792680	swap batch size for gradient accumulation steps to decouple from num gpu	2023-05-31 09:38:12 -04:00
Viktorius Suwandi	b6a539b53c	Update wandb_log_model on cerebras_1_3B_alpaca.yml	2023-05-29 16:32:20 +07:00
Wing Lian	77fca25f1b	4bit quantized support (wip)	2023-04-17 11:37:39 -04:00
Wing Lian	d1aed4c8e5	deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches	2023-04-16 06:59:47 -04:00
Wing Lian	05fffb53b4	more logging, wandb fixes	2023-04-15 13:37:17 -04:00
Wing Lian	b164725417	improve prepared dataset loading, fix inference	2023-04-15 12:14:52 -04:00
Wing Lian	f2a2029d0d	config chooser, update readme instructions, device config, llama flash attention, debug out the labels, fix config key checks, other bugfixes	2023-04-14 12:18:56 -04:00