axolotl

Author	SHA1	Message	Date
Wing Lian	98bf76e236	fsdp requires params be the same type too (#493 )	2023-08-28 04:33:50 -04:00
NanoCode012	4c37bd0b54	Fix(tokenizer): Make sure to add pad for CodeLlamaTokenizer (#489 )	2023-08-28 09:39:10 +09:00
Aman Gupta Karmani	f144e98a32	Merge pull request #485 from maximegmd/patch-4 fix: finetune model inference needs the dtype fix to work with flash-attn	2023-08-27 16:27:47 -04:00
Aman Karmani	3a011ea1ef	fix condition and add logging	2023-08-27 20:09:26 +00:00
Aman Karmani	1f613e5aa7	Merge branch 'main' into patch-4	2023-08-27 19:57:34 +00:00
Aman Karmani	f319b0bc67	rename var and reformat	2023-08-27 19:55:11 +00:00
Maxime	7fd662dd89	Update src/axolotl/utils/models.py Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>	2023-08-27 21:01:43 +02:00
Maxime	9e699683d7	Update src/axolotl/utils/models.py Co-authored-by: Aman Gupta Karmani <aman@tmm1.net>	2023-08-27 21:01:37 +02:00
mhenrichsen	35130711d6	Feat(cfg): Add code-llama configs for all sizes (#479 ) * configs for all sizes * update tokenizer type --------- Co-authored-by: mhenrichsen <some_email@hey.com>	2023-08-27 10:20:17 +09:00
mhenrichsen	3fc9006298	Feat(deepspeed): Add zero2 config (#476 ) * zero2 config * config added * linting --------- Co-authored-by: mhenrichsen <some_email@hey.com>	2023-08-27 10:10:33 +09:00
NanoCode012	ad8be435ad	Feat(doc): Update eval_steps doc (#487 )	2023-08-27 10:09:09 +09:00
Charles O. Goddard	fe4d6baf92	Add example Llama 2 ReLoRA config (#471 ) * Add example Llama 2 ReLoRA config * Use adamw_bnb_8bit in example relora config	2023-08-27 10:08:34 +09:00
Aman Gupta Karmani	f31301063d	Merge pull request #486 from OpenAccess-AI-Collective/adam-bnb-simpler let transformers handle adamw_bnb_8bit	2023-08-26 20:44:19 -04:00
Aman Karmani	868530c39c	let transformers handle adamw_bnb_8bit	2023-08-26 21:40:12 +00:00
Maxime	d03887fad5	ignore: address pr review	2023-08-26 22:45:45 +02:00
Maxime	17605b85d8	fix: inference did not move the model to the correct device (#483 )	2023-08-26 16:40:56 -04:00
Maxime	a184549e4c	ignore: linter	2023-08-26 22:36:14 +02:00
Maxime	f311df9462	fix: finetune model inference needs the dtype fix to work with flash-attn	2023-08-26 22:34:11 +02:00
Maxime	c500d02517	Fix missing 'packaging' wheel (#482 )	2023-08-26 12:02:15 -04:00
Wing Lian	31f3e71764	fix checkpints on multigpu (#481 )	2023-08-26 12:00:03 -04:00
Aman Gupta Karmani	56c4a94caf	Merge pull request #484 from OpenAccess-AI-Collective/reqs allow newer deps in requirements.txt	2023-08-26 11:13:41 -04:00
Aman Karmani	c29117a0d7	allow newer deps	2023-08-26 15:06:05 +00:00
Wing Lian	0b7ba57ec4	fix types w lora (#478 )	2023-08-25 02:03:24 -04:00
NanoCode012	71bd06243c	Fix(tokenizer): Fix condition to add pad token (#477 ) * Fix(tokenizer): Fix condition to add pad token * chore: fix lint	2023-08-25 14:30:50 +09:00
Wing Lian	cb9797ef5a	improve llama pad token handling (#475 ) * improve llama pad token handling * tweak logic to not clobber	2023-08-24 13:20:35 -04:00
Charles O. Goddard	bde3c5a478	ReLoRA implementation (with quantization) (#322 ) * Experimental ReLoRA (+qlora) implementation * Add CPU offload * Remove local config * Fix saving logic * Remove redundant assert * Fix logic errors * Move ReLoRA into its own trainer class with a method override to create the proper scheduler * Formatting & typing fixes * Use safe_serialization * Don't allow fsdp/deepspeed with ReLoRA * Fix cpu-offload logic, enable multi gpu * Document parameters and add comment * Fix merge issue * Smooth over some sharp edges * Implement resume from checkpoint for relora * Address review comments * Fix saving logic * Add necessary metadata to safetensors --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-23 23:07:18 -04:00
NanoCode012	55c23c7bcb	Fix(doc): Clarify config (#466 )	2023-08-23 11:56:01 -04:00
Wing Lian	c69faee7a7	workaround so training doesn't hang when packed dataloader batches aren't even (#461 ) * workaround so training doesn't hang when packed dataloader batches aren't even * don't bother labeling anything in the no-op data	2023-08-23 10:39:11 -04:00
Wing Lian	d5dcf9c350	fix test fixture b/c hf trainer tokenization changed (#464 )	2023-08-23 04:04:49 -04:00
TearGosling	f4746507f6	feat: add Metharme prompt strategy (#446 ) * Add Metharme tokenizing strategy This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length. I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now. * Redo Metharme tokenizing strategy lol * fix: oops * Rearrange a conditional * chore: reformat code in accordance with linter * chore: Make lint not freak out * chore: fix lint --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-08-22 11:21:45 +09:00
Wing Lian	96deb6bd67	recast loralayer, norm, lmhead + embed token weights per original qlora (#393 ) * recast loralayer, norm, lmhead + embed token weights per original qlora * try again for the fix * refactor torch dtype picking * linter fixes * missing import for LoraLayer * fix install for tests now that peft is involved	2023-08-21 18:41:12 -04:00
Wing Lian	50682a3c06	always drop samples that are too long (#452 )	2023-08-21 16:43:33 -04:00
Wing Lian	5a1985ba24	set env var for FSDP layer to wrap (#453 )	2023-08-21 16:43:22 -04:00
Aman Gupta Karmani	5e9c6afa10	Merge pull request #451 from OpenAccess-AI-Collective/eval-is-causal is_causal fix for evals?	2023-08-21 10:43:46 -07:00
Aman Karmani	a213d9972a	fix eval regression caused in `13f7efaf74`	2023-08-21 10:40:06 -07:00
Wing Lian	fbf49a4770	is_causal fix for evals?	2023-08-21 10:36:26 -04:00
Wing Lian	58cf7e7fed	add missing positional arg (#450 )	2023-08-21 04:10:19 -04:00
NanoCode012	04a42b6db1	feat(docs): improve user customized prompts (#443 ) * feat(docs): improve user customized prompts * feat(doc): add custom pretokenized instructions * chore: clean old data folder * chore: add new line	2023-08-20 23:59:43 -04:00
NanoCode012	919f4cac90	feat(doc): add pillow to lambda instructions (#445 )	2023-08-20 23:59:23 -04:00
Wing Lian	ee262818ef	fix evals (#447 )	2023-08-20 23:39:42 -04:00
Wing Lian	9d629d8bff	gracefully handle empty input (#442 )	2023-08-20 09:18:18 -04:00
Wing Lian	d2e7f27240	support user defined prompters, pretokenized datasets in config, local parquet, local arrow files (#348 ) * support user defined prompters, pretokenized datasets in config, local parquet, local arrow files * fix user defined dataset types * fix for system prompts * fix tests * fix checks for parquet and arrow * aha moment that d.data_files isn't used * add documentation for ds_type to add support for parquet and arrow	2023-08-20 09:17:49 -04:00
Philpax	d21318dfb9	docs(readme): add `cd axolotl` (#440 )	2023-08-19 19:14:05 -04:00
Wing Lian	f733d0f31e	disable eval using multipack for now (#437 )	2023-08-19 10:35:04 -04:00
Wing Lian	008505c8ae	fix comma, not a tuple (#436 )	2023-08-19 00:57:40 -04:00
Wing Lian	b3f5e00ff5	use save_strategy from config if available (#434 ) * use save_strategy from config if available * update docs for save_strategy	2023-08-18 20:28:23 -04:00
Wing Lian	5247c5004e	set env for FSDP offload params (#433 )	2023-08-18 20:28:09 -04:00
mhenrichsen	cf6654769a	flash attn pip install (#426 ) * flash attn pip * add packaging * add packaging to apt get * install flash attn in dockerfile * remove unused whls * add wheel * clean up pr fix packaging requirement for ci upgrade pip for ci skip build isolation for requiremnents to get flash-attn working install flash-attn seperately * install wheel for ci * no flash-attn for basic cicd * install flash-attn as pip extras --------- Co-authored-by: Ubuntu <mgh@mgh-vm.wsyvwcia0jxedeyrchqg425tpb.ax.internal.cloudapp.net> Co-authored-by: mhenrichsen <some_email@hey.com> Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-18 19:00:27 -04:00
Aman Gupta Karmani	06edf175ac	standardize attn hijack patches (#381 ) * split sdp attn into its own patch * sync xformers patch to follow shared format and be diffable * update flash-attn patch for 70B/GQA and inference using helper from flash-attn tests * speed up flash-attn inference * fix patch to check position ids and don't use multipack for evals * copy LlamaModel.forward and LlamaDecoderLayer.forward into monkeypatch * update forwards so we only calculate cu_seqlens once * enable eval dataloader using multipack again * fix the patch to work properly and work with FSDP --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-18 12:54:16 -04:00
mhenrichsen	0a228479b3	adds color (#425 ) * adds color * chore: lint * fix for colorama --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-18 10:59:43 -04:00

1 2 3 4 5 ...

846 Commits