axolotl

Author	SHA1	Message	Date
Wing Lian	c2a0792680	swap batch size for gradient accumulation steps to decouple from num gpu	2023-05-31 09:38:12 -04:00
Viktorius Suwandi	abddcf4dfe	Update wandb_log_model on pythia_1_2B_alpaca.yml	2023-05-29 16:31:53 +07:00
Wing Lian	77fca25f1b	4bit quantized support (wip)	2023-04-17 11:37:39 -04:00
Wing Lian	d1aed4c8e5	deepspeed doesn't work with flash-attn, and the gpu savings w flash attn are better than the deepspeed headaches	2023-04-16 06:59:47 -04:00
Wing Lian	05fffb53b4	more logging, wandb fixes	2023-04-15 13:37:17 -04:00
Wing Lian	b164725417	improve prepared dataset loading, fix inference	2023-04-15 12:14:52 -04:00
Wing Lian	949a27be21	more fixes and prep for llama training	2023-04-14 18:30:09 -04:00
Wing Lian	8d959a7e26	make it work with pythia in the cloud	2023-04-14 07:24:55 -04:00
Wing Lian	ce24f5e246	WIP for axolotl trainer	2023-04-14 00:20:05 -04:00