axolotl

Author	SHA1	Message	Date
Wing Lian	cda52dc32b	support for explicit test_dataset definition for evals (#786 )	2024-01-22 21:29:56 -05:00
Wing Lian	e799e08d3c	Falcon embeddings (#1149 ) [skip docker] * also fix multipack for falcon and add smoke tests * make sure to handle special tokens and added tokens for lora * fix reference to model_type * fix tests for falcon * fix stray typo * fixes for smoke tests	2024-01-22 21:01:42 -05:00
Wing Lian	0f77b8d798	add commit message option to skip docker image builds in ci (#1168 ) [skip ci]	2024-01-22 19:55:36 -05:00
Wing Lian	32580c1ca7	Vram fix attempt (#1164 ) [skip ci] * revert order of filter/drop_long step and handle calc for max_input_len only during preprocessing * revert some changes to preparing for packing to allow more flexibility * prepare dataset for packing during pre-processing step * prepare dataset hash based on sample packing too * enclose none check * just cast straight to string for ds hash	2024-01-22 19:54:54 -05:00
Wing Lian	802f9667a2	improve vram use w gradient checkpointing (#1167 ) [skip ci]	2024-01-22 19:48:22 -05:00
JohanWork	b8e5603467	Add mlflow callback for pushing config to mlflow artifacts (#1125 ) * Update callbacks.py adding callback for mlflow * Update trainer_builder.py * clean up	2024-01-22 18:44:39 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Wing Lian	eaaeefce55	jupyter lab fixes (#1139 ) [skip ci] * add a basic notebook for lab users in the root * update notebook and fix cors for jupyter * cell is code * fix eval batch size check * remove intro notebook	2024-01-22 18:42:40 -05:00
Wing Lian	f5a828aa20	Qwen2 (#1166 ) * qwen2 multipack support * fix qwen derived model check so it doesn't break qwen2 * fixes to ensure qwen2 packing works * bump requirements for qwen2 * requirements typo	2024-01-22 18:24:15 -05:00
Wing Lian	fccb542b47	make sure the model config loader respects the model_revision too (#1160 ) [skip-ci]	2024-01-22 13:23:14 -05:00
Wing Lian	2ce5c0d68a	Deprecate max packed sequence len (#1141 )	2024-01-20 05:11:50 -05:00
NanoCode012	3db5f2fd17	feat(dataset): add config to keep processed dataset in memory (#1152 )	2024-01-20 13:19:28 +09:00
Wing Lian	cbecf3e62a	fix check for env var (#1151 )	2024-01-18 23:58:11 -05:00
Wing Lian	729740df81	Dockerfile cloud ports (#1148 ) * explicitly expose ports 8888 and 22 * support for SSH_KEY from latitude	2024-01-18 22:04:25 -05:00
Joe Cummings	08b8ba09a5	Fix link for Minotaur model (#1146 ) [skip-ci]	2024-01-18 17:22:04 -05:00
Wing Lian	6910e6a8ca	Multipack simplify for Mixtral (#1142 )	2024-01-18 16:23:49 -05:00
Joe Cummings	1d70f24b50	Add shifted sparse attention (#973 ) [skip-ci] * Add s2_attn to hijack flash code * Refactor code to account for s2_attn * Add test for models utils * Add ``s2_attention`` option to llama configs * Add ``s2_attention`` option to README config * Format code to appease linter * chore: lint * Remove xpos and llama-landmark [bad merge] * add e2e smoke tests for shifted sparse attention * remove stray patch from merge * update yml with link to paper for s2_attention/longlora * fix assertion check for full fine tune * increase sequence len for tests and PR feedback updates * reduce context len to 16k for tests * reduce context len to 16k for tests * reduce batch size for larger context len and udpate test to check message * fix test for message --------- Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-18 10:16:07 -05:00
Wing Lian	317fa2555a	fix bf16 check when preprocessing data (#1140 )	2024-01-17 22:41:23 -05:00
NanoCode012	1e56b88cde	fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli (#1136 )	2024-01-18 03:03:52 +09:00
Wing Lian	7570446596	Preprocess dataset size fix (#1131 ) * overwrite cache on preprocess step * don't cache the TokenizedPromptDataset at all * load_from_cache_file no longer needed	2024-01-17 11:02:41 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00
xzuyn	8487b97cf3	Add `layers_to_transform` for `lora_config` (#1118 )	2024-01-15 21:29:55 -05:00
NanoCode012	9cd27b2f91	fix(readme): clarify custom user prompt [no-ci] (#1124 ) * fix(readme): clarify custom user prompt * chore: update example to show use case of setting field	2024-01-16 09:47:33 +09:00
Wing Lian	c1b741d9fb	pin model_revision for phi2 (#1123 )	2024-01-14 17:31:51 -05:00
Wing Lian	0abf4d6504	update PR template so we can capture twitter or discord handles (#1121 ) [skip ci] * update PR template so we can capture twitter or discord handles [skip ci] * ensure that the PR template is in the correct place	2024-01-14 16:19:01 -05:00
Simon Hällqvist	086561326f	Enable or disable bf16 support based on availability (#1116 )	2024-01-14 12:06:56 -05:00
Casper	2202a20f60	Reverse caching PR (#1115 )	2024-01-13 10:17:40 -05:00
Casper	d66b10141e	Disable caching on `--disable_caching` in CLI (#1110 ) * Disable caching on `--disable_caching` in CLI * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-13 10:13:35 +01:00
Hamel Husain	304ea1b814	Update debugging.md (#1111 )	2024-01-12 21:07:31 -08:00
Wing Lian	da97285e63	keep gate in fp32 for 16 bit loras (#1105 ) * keep gate in fp32 for loras * add e2e check for lora w/o flash attention for mixtral to check gate * add checks for gate in fp32 for mixtral, add typehints to train outputs * mixtral doesn't support basic lora 🤦 add lora tests @ 16bit and fix gate layer check fix the parameter name, was using the old disco name don't lora over the gate so we can check that is in fp32 fix dtype check * ensure we're using fp16/bf16 for 16bit and qlora is always going to be in uint8	2024-01-12 14:58:21 -05:00
Hamel Husain	2dc431078c	Add link on README to Docker Debugging (#1107 ) * add docker debug * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * explain editable install * explain editable install * upload new video * add link to README * Update README.md * Update README.md * chore: lint * make sure to lint markdown too --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-12 08:51:35 -05:00
Hamel Husain	6d342b52a4	Add section for debugging with Docker (#1104 ) * add docker debug * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * explain editable install * explain editable install * upload new video --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-11 18:43:33 -08:00
Hamel Husain	b502392e82	Update README.md (#1103 ) * Update README.md * Update README.md	2024-01-11 16:41:58 -08:00
Mark Saroufim	44ba616da2	Fix broken pypi.yml (#1099 ) [skip ci]	2024-01-11 12:35:31 -05:00
NanoCode012	b432889256	feat: enable trl's autounwrap (#1060 ) * feat: test trl's autounwrap * fix: add check for adapter * feat: add config to disable autounwrap * chore: fix lint	2024-01-11 08:43:41 -05:00
Hamel Husain	54fe07a905	Fix debugging.md (#1091 )	2024-01-10 21:44:40 -08:00
Hamel Husain	7512c3ad20	Add Debugging Guide (#1089 ) * add debug guide * add background * add .gitignore * Update devtools/dev_sharegpt.yml Co-authored-by: Wing Lian <wing.lian@gmail.com> * Update docs/debugging.md Co-authored-by: Wing Lian <wing.lian@gmail.com> * simplify example axolotl config * add additional comments * add video and TOC * try jsonc for better md rendering * style video thumbnail better * fix footnote --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-10 20:49:24 -08:00
Wing Lian	78c5b1979e	add gptneox embeddings, fix phi2 inputs, also fix the casting (#1083 )	2024-01-10 22:32:43 -05:00
Wing Lian	23495a80af	misc fixes from #943 (#1086 ) [skip ci]	2024-01-10 22:31:36 -05:00
Casper	91502b98d4	Remove fused-dense-lib from requirements.txt (#1087 )	2024-01-10 21:26:41 +01:00
Wing Lian	6c19e9302a	add python 3.11 to the matrix for unit tests (#1085 ) [skip ci]	2024-01-10 13:02:01 -05:00
Wing Lian	90036ebbc6	optimize calculation of cu_seqlens from position_ids (#1084 ) [skip ci]	2024-01-10 11:54:50 -05:00
Wing Lian	9032e610b1	use tags again for test image, only run docker e2e after pre-commit checks (#1081 )	2024-01-10 09:04:56 -05:00
NanoCode012	d69ba2b0b7	fix: warn user to install mamba_ssm package (#1019 )	2024-01-10 02:50:56 -05:00
Wing Lian	9e3f0cb5a7	pin accelerate for deepspeed fix (#1080 )	2024-01-10 00:50:04 -05:00
Wing Lian	2f2582e6ed	additional logging to get maximum token length of a sequence in the dataset (#1066 ) [skip ci] * additional logging to get maximum token length of a sequence in the dataset * fix ordering to properly determine the max_len of tokens before dropping anything longer	2024-01-10 00:49:31 -05:00
Wing Lian	0ce1a6594e	update sharegpt conversations when chatml chat template is set (#1075 ) [skip ci] * update sharegpt conversations when chatml chat template is set * add info log when updating sharegpt/chatml conversation	2024-01-10 00:49:07 -05:00
NanoCode012	043c3860cd	fix: `train_on_inputs: true` ignored for sharegpt (#1045 ) [skip ci] * fix: `train_on_inputs: true` ignored for sharegpt * enable unit test for train_on_inputs for sharegpt --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-09 23:00:09 -05:00
Wing Lian	0f100800e3	be more robust about checking embedding modules for lora finetunes (#1074 ) [skip ci] * be more robust about checking embedding modules for lora finetunes * update dynamic error message	2024-01-09 22:58:54 -05:00
Wing Lian	ead34c516a	swap the data collator for evals if not using sample packing (#1076 ) * swap the data collator for evals if not using sample packing * drop last from dataloader to help with issues with evals	2024-01-09 22:16:24 -05:00

1 2 3 4 5 ...

1214 Commits