axolotl

Author	SHA1	Message	Date
Wing Lian	588cd65a64	fix setup.py to use extra index url install torch for tests fix cuda version for autogptq index set torch in requirements so that it installs properly move gptq install around to work with github cicd	2023-08-29 12:02:51 -07:00
Wing Lian	caa80e891d	don't need explicit peft install for tests	2023-08-29 12:02:51 -07:00
Wing Lian	ac37753aa2	remove old gptq docker	2023-08-29 12:02:50 -07:00
Wing Lian	a29560004b	more tweaks and add yml	2023-08-29 12:02:00 -07:00
Wing Lian	1deb767fe8	auto gptq support	2023-08-29 11:31:14 -07:00
Wing Lian	548787daae	customizable ascii art (#506 )	2023-08-29 10:13:42 -07:00
Wing Lian	5ac3392075	support for datasets with multiple names (#480 ) * support for datasets with multiple names * update docs	2023-08-29 06:18:17 -07:00
Aman Gupta Karmani	e356b297cb	remove --force-reinstall from Dockerfile to ensure correct pytorch version (#492 )	2023-08-29 06:17:51 -07:00
NanoCode012	48c56470d0	Fix(doc): Clarify no amp to full yaml docs (#496 )	2023-08-29 06:17:37 -07:00
Maxime	36b2e1cfee	tweak: use default config file when only one file is present (#501 )	2023-08-29 06:17:10 -07:00
Wing Lian	125cccb786	Refactor train cfg cli (#499 ) * wip to cleanup cfg cli options * fix launcher * fix cli args	2023-08-29 05:37:53 -07:00
Aman Karmani	fd55bc87e2	use math.ceil instead of round /cc #498	2023-08-29 01:03:41 +00:00
Birch-san	8e197f6fb4	pad_to_worst_case_seq_len boolean, for testing memory limits (#498 ) * pad_to_worst_case_seq_len boolean, for testing memory limits * remove collator_pad_to_longest option since it does nothing see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding True and "longest" mean the same thing * rename to `pad_to_sequence_len, and ensure 64 alignment --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2023-08-28 18:47:16 -04:00
Aman Karmani	267b7b24e5	simplify linear layer locator	2023-08-28 09:45:16 -04:00
Wing Lian	98bf76e236	fsdp requires params be the same type too (#493 )	2023-08-28 04:33:50 -04:00
NanoCode012	4c37bd0b54	Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489 )	2023-08-28 09:39:10 +09:00
Aman Gupta Karmani	f144e98a32	Merge pull request #485 from maximegmd/patch-4 fix: finetune model inference needs the dtype fix to work with flash-attn	2023-08-27 16:27:47 -04:00
Aman Karmani	3a011ea1ef	fix condition and add logging	2023-08-27 20:09:26 +00:00
Aman Karmani	1f613e5aa7	Merge branch 'main' into patch-4	2023-08-27 19:57:34 +00:00
Aman Karmani	f319b0bc67	rename var and reformat	2023-08-27 19:55:11 +00:00
Maxime	7fd662dd89	Update src/axolotl/utils/models.py Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>	2023-08-27 21:01:43 +02:00
Maxime	9e699683d7	Update src/axolotl/utils/models.py Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>	2023-08-27 21:01:37 +02:00
mhenrichsen	35130711d6	Feat(cfg): Add code-llama configs for all sizes (#479 ) * configs for all sizes * update tokenizer type --------- Co-authored-by: mhenrichsen <some_email@hey.com>	2023-08-27 10:20:17 +09:00
mhenrichsen	3fc9006298	Feat(deepspeed): Add zero2 config (#476 ) * zero2 config * config added * linting --------- Co-authored-by: mhenrichsen <some_email@hey.com>	2023-08-27 10:10:33 +09:00
NanoCode012	ad8be435ad	Feat(doc): Update eval_steps doc (#487 )	2023-08-27 10:09:09 +09:00
Charles O. Goddard	fe4d6baf92	Add example Llama 2 ReLoRA config (#471 ) * Add example Llama 2 ReLoRA config * Use adamw_bnb_8bit in example relora config	2023-08-27 10:08:34 +09:00
Aman Gupta Karmani	f31301063d	Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler let transformers handle adamw_bnb_8bit	2023-08-26 20:44:19 -04:00
Aman Karmani	868530c39c	let transformers handle adamw_bnb_8bit	2023-08-26 21:40:12 +00:00
Maxime	d03887fad5	ignore: address pr review	2023-08-26 22:45:45 +02:00
Maxime	17605b85d8	fix: inference did not move the model to the correct device (#483 )	2023-08-26 16:40:56 -04:00
Maxime	a184549e4c	ignore: linter	2023-08-26 22:36:14 +02:00
Maxime	f311df9462	fix: finetune model inference needs the dtype fix to work with flash-attn	2023-08-26 22:34:11 +02:00
Maxime	c500d02517	Fix missing 'packaging' wheel (#482 )	2023-08-26 12:02:15 -04:00
Wing Lian	31f3e71764	fix checkpints on multigpu (#481 )	2023-08-26 12:00:03 -04:00
Aman Gupta Karmani	56c4a94caf	Merge pull request #484 from OpenAccess-AI-Collective/reqs allow newer deps in requirements.txt	2023-08-26 11:13:41 -04:00
Aman Karmani	c29117a0d7	allow newer deps	2023-08-26 15:06:05 +00:00
Wing Lian	0b7ba57ec4	fix types w lora (#478 )	2023-08-25 02:03:24 -04:00
NanoCode012	71bd06243c	Fix(tokenizer): Fix condition to add pad token (#477 ) * Fix(tokenizer): Fix condition to add pad token * chore: fix lint	2023-08-25 14:30:50 +09:00
Wing Lian	cb9797ef5a	improve llama pad token handling (#475 ) * improve llama pad token handling * tweak logic to not clobber	2023-08-24 13:20:35 -04:00
Charles O. Goddard	bde3c5a478	ReLoRA implementation (with quantization) (#322 ) * Experimental ReLoRA (+qlora) implementation * Add CPU offload * Remove local config * Fix saving logic * Remove redundant assert * Fix logic errors * Move ReLoRA into its own trainer class with a method override to create the proper scheduler * Formatting & typing fixes * Use safe_serialization * Don't allow fsdp/deepspeed with ReLoRA * Fix cpu-offload logic, enable multi gpu * Document parameters and add comment * Fix merge issue * Smooth over some sharp edges * Implement resume from checkpoint for relora * Address review comments * Fix saving logic * Add necessary metadata to safetensors --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-23 23:07:18 -04:00
NanoCode012	55c23c7bcb	Fix(doc): Clarify config (#466 )	2023-08-23 11:56:01 -04:00
Wing Lian	c69faee7a7	workaround so training doesn't hang when packed dataloader batches aren't even (#461 ) * workaround so training doesn't hang when packed dataloader batches aren't even * don't bother labeling anything in the no-op data	2023-08-23 10:39:11 -04:00
Wing Lian	d5dcf9c350	fix test fixture b/c hf trainer tokenization changed (#464 )	2023-08-23 04:04:49 -04:00
TearGosling	f4746507f6	feat: add Metharme prompt strategy (#446 ) * Add Metharme tokenizing strategy This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length. I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now. * Redo Metharme tokenizing strategy lol * fix: oops * Rearrange a conditional * chore: reformat code in accordance with linter * chore: Make lint not freak out * chore: fix lint --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-08-22 11:21:45 +09:00
Wing Lian	96deb6bd67	recast loralayer, norm, lmhead + embed token weights per original qlora (#393 ) * recast loralayer, norm, lmhead + embed token weights per original qlora * try again for the fix * refactor torch dtype picking * linter fixes * missing import for LoraLayer * fix install for tests now that peft is involved	2023-08-21 18:41:12 -04:00
Wing Lian	50682a3c06	always drop samples that are too long (#452 )	2023-08-21 16:43:33 -04:00
Wing Lian	5a1985ba24	set env var for FSDP layer to wrap (#453 )	2023-08-21 16:43:22 -04:00
Aman Gupta Karmani	5e9c6afa10	Merge pull request #451 from OpenAccess-AI-Collective/eval-is-causal is_causal fix for evals?	2023-08-21 10:43:46 -07:00
Aman Karmani	a213d9972a	fix eval regression caused in `13f7efaf74`	2023-08-21 10:40:06 -07:00
Wing Lian	fbf49a4770	is_causal fix for evals?	2023-08-21 10:36:26 -04:00

1 2 3 4 5 ...

860 Commits