axolotl

Author	SHA1	Message	Date
xzuyn	8487b97cf3	Add `layers_to_transform` for `lora_config` (#1118 )	2024-01-15 21:29:55 -05:00
NanoCode012	9cd27b2f91	fix(readme): clarify custom user prompt [no-ci] (#1124 ) * fix(readme): clarify custom user prompt * chore: update example to show use case of setting field	2024-01-16 09:47:33 +09:00
Wing Lian	c1b741d9fb	pin model_revision for phi2 (#1123 )	2024-01-14 17:31:51 -05:00
Wing Lian	0abf4d6504	update PR template so we can capture twitter or discord handles (#1121 ) [skip ci] * update PR template so we can capture twitter or discord handles [skip ci] * ensure that the PR template is in the correct place	2024-01-14 16:19:01 -05:00
Simon Hällqvist	086561326f	Enable or disable bf16 support based on availability (#1116 )	2024-01-14 12:06:56 -05:00
Casper	2202a20f60	Reverse caching PR (#1115 )	2024-01-13 10:17:40 -05:00
Casper	d66b10141e	Disable caching on `--disable_caching` in CLI (#1110 ) * Disable caching on `--disable_caching` in CLI * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-13 10:13:35 +01:00
Hamel Husain	304ea1b814	Update debugging.md (#1111 )	2024-01-12 21:07:31 -08:00
Wing Lian	da97285e63	keep gate in fp32 for 16 bit loras (#1105 ) * keep gate in fp32 for loras * add e2e check for lora w/o flash attention for mixtral to check gate * add checks for gate in fp32 for mixtral, add typehints to train outputs * mixtral doesn't support basic lora 🤦 add lora tests @ 16bit and fix gate layer check fix the parameter name, was using the old disco name don't lora over the gate so we can check that is in fp32 fix dtype check * ensure we're using fp16/bf16 for 16bit and qlora is always going to be in uint8	2024-01-12 14:58:21 -05:00
Hamel Husain	2dc431078c	Add link on README to Docker Debugging (#1107 ) * add docker debug * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * explain editable install * explain editable install * upload new video * add link to README * Update README.md * Update README.md * chore: lint * make sure to lint markdown too --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-12 08:51:35 -05:00
Hamel Husain	6d342b52a4	Add section for debugging with Docker (#1104 ) * add docker debug * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * explain editable install * explain editable install * upload new video --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-11 18:43:33 -08:00
Hamel Husain	b502392e82	Update README.md (#1103 ) * Update README.md * Update README.md	2024-01-11 16:41:58 -08:00
Mark Saroufim	44ba616da2	Fix broken pypi.yml (#1099 ) [skip ci]	2024-01-11 12:35:31 -05:00
NanoCode012	b432889256	feat: enable trl's autounwrap (#1060 ) * feat: test trl's autounwrap * fix: add check for adapter * feat: add config to disable autounwrap * chore: fix lint	2024-01-11 08:43:41 -05:00
Hamel Husain	54fe07a905	Fix debugging.md (#1091 )	2024-01-10 21:44:40 -08:00
Hamel Husain	7512c3ad20	Add Debugging Guide (#1089 ) * add debug guide * add background * add .gitignore * Update devtools/dev_sharegpt.yml Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * simplify example axolotl config * add additional comments * add video and TOC * try jsonc for better md rendering * style video thumbnail better * fix footnote --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-10 20:49:24 -08:00
Wing Lian	78c5b1979e	add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083 )	2024-01-10 22:32:43 -05:00
Wing Lian	23495a80af	misc fixes from #943 (#1086 ) [skip ci]	2024-01-10 22:31:36 -05:00
Casper	91502b98d4	Remove fused-dense-lib from requirements.txt (#1087 )	2024-01-10 21:26:41 +01:00
Wing Lian	6c19e9302a	add python 3.11 to the matrix for unit tests (#1085 ) [skip ci]	2024-01-10 13:02:01 -05:00
Wing Lian	90036ebbc6	optimize calculation of cu_seqlens from position_ids (#1084 ) [skip ci]	2024-01-10 11:54:50 -05:00
Wing Lian	9032e610b1	use tags again for test image, only run docker e2e after pre-commit checks (#1081 )	2024-01-10 09:04:56 -05:00
NanoCode012	d69ba2b0b7	fix: warn user to install mamba_ssm package (#1019 )	2024-01-10 02:50:56 -05:00
Wing Lian	9e3f0cb5a7	pin accelerate for deepspeed fix (#1080 )	2024-01-10 00:50:04 -05:00
Wing Lian	2f2582e6ed	additional logging to get maximum token length of a sequence in the dataset (#1066 ) [skip ci] * additional logging to get maximum token length of a sequence in the dataset * fix ordering to properly determine the max_len of tokens before dropping anything longer	2024-01-10 00:49:31 -05:00
Wing Lian	0ce1a6594e	update sharegpt conversations when chatml chat template is set (#1075 ) [skip ci] * update sharegpt conversations when chatml chat template is set * add info log when updating sharegpt/chatml conversation	2024-01-10 00:49:07 -05:00
NanoCode012	043c3860cd	fix: `train_on_inputs: true` ignored for sharegpt (#1045 ) [skip ci] * fix: `train_on_inputs: true` ignored for sharegpt * enable unit test for train_on_inputs for sharegpt --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 23:00:09 -05:00
Wing Lian	0f100800e3	be more robust about checking embedding modules for lora finetunes (#1074 ) [skip ci] * be more robust about checking embedding modules for lora finetunes * update dynamic error message	2024-01-09 22:58:54 -05:00
Wing Lian	ead34c516a	swap the data collator for evals if not using sample packing (#1076 ) * swap the data collator for evals if not using sample packing * drop last from dataloader to help with issues with evals	2024-01-09 22:16:24 -05:00
Wing Lian	ec02b7cc4e	Update FUNDING.yml [skip ci]	2024-01-09 22:15:27 -05:00
Wing Lian	3b4c646f87	Update FUNDING.yml with bitcoin (#1079 ) [skip ci]	2024-01-09 21:56:52 -05:00
Wing Lian	788649fe95	attempt to also run e2e tests that needs gpus (#1070 ) * attempt to also run e2e tests that needs gpus * fix stray quote * checkout specific github ref * dockerfile for tests with proper checkout ensure wandb is dissabled for docker pytests clear wandb env after testing clear wandb env after testing make sure to provide a default val for pop tryin skipping wandb validation tests explicitly disable wandb in the e2e tests explicitly report_to None to see if that fixes the docker e2e tests split gpu from non-gpu unit tests skip bf16 check in test for now build docker w/o cache since it uses branch name ref revert some changes now that caching is fixed skip bf16 check if on gpu w support * pytest skip for auto-gptq requirements * skip mamba tests for now, split multipack and non packed lora llama tests * split tests that use monkeypatches * fix relative import for prev commit * move other tests using monkeypatches to the correct run	2024-01-09 21:23:23 -05:00
Casper	9be92d1448	Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` (#1077 ) * Separate AutoGPTQ dep to `pip install -e .[auto-gptq]` * Fix code review	2024-01-09 23:39:25 +01:00
Wing Lian	d7057ccd36	paired kto support (#1069 )	2024-01-09 13:30:45 -05:00
mtenenholtz	768d348f42	update peft to 0.7.0 (#1073 )	2024-01-09 12:22:14 -05:00
Johan Hansson	090c24dcb0	Add: mlflow for experiment tracking (#1059 ) [skip ci] * Update requirements.txt adding mlflow * Update __init__.py Imports for mlflow * Update README.md * Create mlflow_.py (#1) * Update README.md * fix precommits * Update README.md Update mlflow_tracking_uri * Update trainer_builder.py update trainer building * chore: lint * make ternary a bit more readable --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:34:09 -05:00
Wing Lian	651b7a31fc	fix double eos token for chatml (#1054 ) [skip ci] * fix double eos token for chatml * isolate fix to chatml conversation * fix add special tokens to include rstrip * add test for train_on_inputs for sharegpt * don't use rstrip for chatml	2024-01-09 09:33:38 -05:00
Ricardo Dominguez-Olmedo	04b978b428	Cosine learning rate schedule - minimum learning rate (#1062 ) * Cosine min lr * Cosine min lr - warn if using deepspeed * cosine_min_lr_ratio readme * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 09:29:56 -05:00
NanoCode012	c3e8165f26	fix: torch_dtype mistral default to fp32 (#1050 )	2024-01-09 07:48:15 -05:00
Wing Lian	7f381750d9	Update FUNDING.yml for Kofi link (#1067 )	2024-01-08 19:26:51 -05:00
Wing Lian	14964417ee	Sponsors (#1065 ) * wip sponsors section in readme * add ko-fi and contributors list	2024-01-08 18:52:00 -05:00
Ricardo Dominguez-Olmedo	81d384598e	Efficiently get the length of the tokenized docs (#1063 ) * Efficiently get the length of the tokenized docs * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-08 15:48:30 -05:00
Wing Lian	732851f105	Phi2 rewrite (#1058 ) * restore to current phi modeling code from phi-2 * enable gradient checkpointing * don't cast everything to float32 all the time * gradient checkpointing for phi2 ParallelBlock module too * fix enabling flash attn for phi2 * add comment about import * fix phi2 example * fix model type check for tokenizer * revert float32 -> bf16 casting changes * support fused dense flash attn * fix the repo for flash-attn * add package name for subdir pkg * fix the data collator when not using sample packing * install packaging for pytests in ci * also fix setup to not install flash attn fused dense subdir if not extras * split out the fused-dense-lib in extra requires * don't train w group_by_length for phi * update integration test to use phi2 * set max steps and save steps for phi e2e tests * try to workaround ssave issue in ci * skip phi2 e2e test for now	2024-01-08 14:04:22 -05:00
Hamel Husain	9ca358b671	Simplify Docker Unit Test CI (#1055 ) [skip ci] * Update tests-docker.yml * Update tests-docker.yml * run ci tests on ci yaml updates --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-06 08:20:33 -05:00
JinK	553c80f79a	streaming multipack for pretraining dataset (#959 ) * [Feat] streaming multipack * WIP make continued pretraining work w multipack * fix up hadrcoding, lint * fix dict check * update test for updated pretraining multipack code * fix hardcoded data collator fix for multipack pretraining * fix the collator to be the max length for multipack pretraining * don't bother with latest tag for test * cleanup docker build/test --------- Co-authored-by: jinwonkim93@github.com <jinwonkim> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:13:21 -05:00
Hamel Husain	eb4c99431b	Update tests-docker.yml (#1052 ) [skip ci]	2024-01-05 14:26:18 -05:00
NanoCode012	cbdbf9e6e5	feat: always push checkpoint to hub if set (#1049 ) [skip ci]	2024-01-05 13:09:42 -05:00
kallewoof	bdfefaf054	feature: better device mapping for large models (#918 ) * fix: improved memory handling when model is bigger than existing VRAM * feature: add lora_on_cpu flag to do LoRA loading on CPU (RAM) For big models where the models are taking up the entire GPU VRAM, the LoRA part will fail unless it is loaded on CPU only. * doc: add README * fix: enable progress bars in do_merge_lora() * doc: mention gpu_memory_limit and lora_on_cpu in merge part of README * Update src/axolotl/utils/models.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * fix: remove deletion of removed model_kwargs key * fix: validate that gpu_memory_limit and max_memory are not both set --------- Co-authored-by: Karl-Johan Alm <kalle@gmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-05 22:22:21 +09:00
Hamel Husain	63fb3eb426	set default for merge (#1044 )	2024-01-04 18:14:20 -08:00
Hamel Husain	31d23504a5	fix model card upload for PEFT models (#1043 )	2024-01-04 18:13:54 -08:00

1 2 3 4 5 ...

1193 Commits