axolotl

Author	SHA1	Message	Date
Wing Lian	1a538be9c2	add a prelim test for expading the 4d mask	2024-01-26 00:41:24 -05:00
Wing Lian	74c72ca5eb	drop py39 docker images, add py311, upgrade pytorch to 2.1.2 (#1205 ) * drop py39 docker images, add py311, upgrade pytorch to 2.1.2 * also allow the main build to be manually triggered * fix workflow_dispatch in yaml	2024-01-26 00:38:49 -05:00
Wing Lian	e923e62d24	more checks and fixes for deepspeed and fsdp (#1208 ) [skip ci]	2024-01-25 20:01:45 -05:00
Wing Lian	ba944e6554	workaround for transformers bug requireing do_sample for saveing pretrained (#1206 )	2024-01-25 11:34:41 -05:00
Wing Lian	badda3783b	make sure to register the base chatml template even if no system message is provided (#1207 )	2024-01-25 10:38:08 -05:00
Wing Lian	a01b998c0f	Update deps 202401 (#1204 ) [skip ci] * update deps * xformers fix too	2024-01-25 10:11:49 -05:00
Wing Lian	33e117088f	precompute dpo logprobs setting and fixes (#1199 ) [skip ci] * add support for precompute_ref_log_probs for dpo * add chatml.icr type for argilla orca dpo * update inline doc * also set use_reentrant to false for dpo when not set * don't set use_reentrant to true for rl * make sure to set gradient checkpointing too	2024-01-25 09:31:55 -05:00
Ricardo Dominguez-Olmedo	b4ac96adef	fix learning rate scheduler's warnings (#1135 ) [skip ci] * fix schedulers warnings * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-25 07:09:34 -05:00
mhenrichsen	98b4762077	Feat/chatml add system message (#1117 ) * add system message to template * readme update * added code to register new system message * register chatml template for test --------- Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-25 08:24:27 +01:00
JohanWork	ee0b5f60e5	add colab example (#1196 ) [skip ci]	2024-01-24 20:09:09 -05:00
NanoCode012	08719b9609	fix(log): improve warning to clarify that lora_modules_to_save expect a list (#1197 )	2024-01-24 20:08:34 -05:00
Wing Lian	1427d5b502	prepare for release 0.4.0 (#1175 ) Some checks failed publish pypi / Upload release to PyPI (push) Has been cancelled Details v0.4.0	2024-01-24 15:00:28 -05:00
Wing Lian	54d2ac155b	Mixtral fixes 20240124 (#1192 ) [skip ci] * mixtral nccl fixes * make sure to patch for z3	2024-01-24 14:59:57 -05:00
Oleh Kuznetsov	af0243021c	Standardize system prompt format for AlpacaPrompter (#1190 ) [skip ci]	2024-01-24 14:27:01 -05:00
Wing Lian	8a49309489	upgrade deepspeed to 0.13.1 for mixtral fixes (#1189 ) [skip ci] * upgrade deepspeed to 0.13.1 for mixtral fixes * move deepspeed-kernels install to setup.py	2024-01-24 14:26:40 -05:00
Wing Lian	5bce45f800	more dpo fixes for dataset loading and docs (#1185 ) [skip ci] * more dpo fixes for dataset loading and docs * preprocess dpo datasets	2024-01-24 14:23:55 -05:00
Wing Lian	d85d4942cf	report min lenght of tokenized data (#1186 ) [skip ci]	2024-01-24 09:17:50 -05:00
Agung Baptiso Sorlawan	02f2c720fc	Fix generation_config validation raises Exception for do_merge_lora (#1184 )	2024-01-24 00:42:15 -05:00
James Wade	71141deb18	Add support for offline mode with HF_HUB_OFFLINE envvar (#1182 ) * Add support for offline mode with HF_HUB_OFFLINE envvar * Apply styling * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-24 00:41:47 -05:00
Aleksey Korshuk	dc051b861d	Update rlhf.md (#1178 ) [skip ci]	2024-01-23 15:54:51 -05:00
Wing Lian	59a31fe613	DPO fixes v2 (#1174 ) * check for length before trying to remove it * add validation for sample packing with RLHF	2024-01-23 12:56:24 -05:00
Wing Lian	814aee6603	Phi2 multipack (#1173 ) * phi2 multipack * update validation and examples for phi * more updates to phi examples * make sure to use the correct collator for phi multipack * phi needs attention mask now for multipack * if the special token already exists in the tokenizer, don't require in lora modules to save * fix qlora yml for phi, fix phi test validation * test qlora too * make sure flash attention is enabled for the test * don't use remote code for phi anymore * reduce sequence len for sample packing phi	2024-01-23 12:54:36 -05:00
Wing Lian	b715cd549a	update docs [skip ci] (#1176 )	2024-01-23 11:14:52 -05:00
Wing Lian	fb7f9b9516	don't fail if can't cast weights due to offload when merging (#1172 ) [skip ci]	2024-01-23 09:17:08 -05:00
Tilemachos Chatzipapas	cc250391a0	Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155 ) * Mistral-7b finetune example using axolotl with code,config,data * Corrected the path for huggingface dataset * Update data.jsonl * chore: lint --------- Co-authored-by: twenty8th <twenty8th@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-23 07:32:21 -05:00
Ayush Singh	9135b9e2aa	Update README.md (#1169 ) [skip ci] Fix typo	2024-01-23 07:25:44 -05:00
Wing Lian	7523d1f557	DPO cleanup (#1126 ) * cleanup dpo to be a little more extensible, add zephyr/nectar strategy * fix eos slash * support for eval split * fix kwargs * handle empty evals * don't load peft model for dpo * ensure dpo traning args gets bf16 for peft if applicable * fix duplicate kwargs for bf16 * make sure to respect the configured lr scheduler * supprt trainer callback to push config to wandb * set dataloader preload args * ensure that we are loading the lora when merging * Update src/axolotl/utils/data.py Co-authored-by: Agus <agustin.piqueres@gmail.com> * support local datasets for dpo Co-authored-by: Agus <agustin.piqueres@gmail.com> * chore: lint * dpo/kto/ipo smoke tests w lora, simplify dpo dataset type names * add split to dpo tests * fix rebase/merging error * handle edge case w logging * use accelerator for dpo datasets so it doesn't break the logger * missing args * validate checkpoint is an adapter for now * log warning when dataset strategy is not loadable --------- Co-authored-by: Agus <agustin.piqueres@gmail.com>	2024-01-23 00:40:37 -05:00
JohanWork	5439707489	Feat(test): Add tests for alpaca chatml prompt tokenizer (#1088 ) * draft for adding test for tokenizer * clean up * clean up * fix pre commit * fix pylint * Revert "fix pylint" This reverts commit `cd2cda3cda`. * add pylint exception for pytest fixture * update comments * Apply suggestions from code review Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * update spelling and import promptstyle * reaname, restrucure * clean up * add fmt:on --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-23 13:30:26 +09:00
Casper	684038111e	Add desc to map/filter (#1162 ) * Add desc to map/filter * update descriptions --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-22 21:30:53 -05:00
Wing Lian	cda52dc32b	support for explicit test_dataset definition for evals (#786 )	2024-01-22 21:29:56 -05:00
Wing Lian	e799e08d3c	Falcon embeddings (#1149 ) [skip docker] * also fix multipack for falcon and add smoke tests * make sure to handle special tokens and added tokens for lora * fix reference to model_type * fix tests for falcon * fix stray typo * fixes for smoke tests	2024-01-22 21:01:42 -05:00
Wing Lian	0f77b8d798	add commit message option to skip docker image builds in ci (#1168 ) [skip ci]	2024-01-22 19:55:36 -05:00
Wing Lian	32580c1ca7	Vram fix attempt (#1164 ) [skip ci] * revert order of filter/drop_long step and handle calc for max_input_len only during preprocessing * revert some changes to preparing for packing to allow more flexibility * prepare dataset for packing during pre-processing step * prepare dataset hash based on sample packing too * enclose none check * just cast straight to string for ds hash	2024-01-22 19:54:54 -05:00
Wing Lian	802f9667a2	improve vram use w gradient checkpointing (#1167 ) [skip ci]	2024-01-22 19:48:22 -05:00
JohanWork	b8e5603467	Add mlflow callback for pushing config to mlflow artifacts (#1125 ) * Update callbacks.py adding callback for mlflow * Update trainer_builder.py * clean up	2024-01-22 18:44:39 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Wing Lian	eaaeefce55	jupyter lab fixes (#1139 ) [skip ci] * add a basic notebook for lab users in the root * update notebook and fix cors for jupyter * cell is code * fix eval batch size check * remove intro notebook	2024-01-22 18:42:40 -05:00
Wing Lian	f5a828aa20	Qwen2 (#1166 ) * qwen2 multipack support * fix qwen derived model check so it doesn't break qwen2 * fixes to ensure qwen2 packing works * bump requirements for qwen2 * requirements typo	2024-01-22 18:24:15 -05:00
Wing Lian	fccb542b47	make sure the model config loader respects the model_revision too (#1160 ) [skip-ci]	2024-01-22 13:23:14 -05:00
Wing Lian	2ce5c0d68a	Deprecate max packed sequence len (#1141 )	2024-01-20 05:11:50 -05:00
NanoCode012	3db5f2fd17	feat(dataset): add config to keep processed dataset in memory (#1152 )	2024-01-20 13:19:28 +09:00
Wing Lian	cbecf3e62a	fix check for env var (#1151 )	2024-01-18 23:58:11 -05:00
Wing Lian	729740df81	Dockerfile cloud ports (#1148 ) * explicitly expose ports 8888 and 22 * support for SSH_KEY from latitude	2024-01-18 22:04:25 -05:00
Joe Cummings	08b8ba09a5	Fix link for Minotaur model (#1146 ) [skip-ci]	2024-01-18 17:22:04 -05:00
Wing Lian	6910e6a8ca	Multipack simplify for Mixtral (#1142 )	2024-01-18 16:23:49 -05:00
Joe Cummings	1d70f24b50	Add shifted sparse attention (#973 ) [skip-ci] * Add s2_attn to hijack flash code * Refactor code to account for s2_attn * Add test for models utils * Add ``s2_attention`` option to llama configs * Add ``s2_attention`` option to README config * Format code to appease linter * chore: lint * Remove xpos and llama-landmark [bad merge] * add e2e smoke tests for shifted sparse attention * remove stray patch from merge * update yml with link to paper for s2_attention/longlora * fix assertion check for full fine tune * increase sequence len for tests and PR feedback updates * reduce context len to 16k for tests * reduce context len to 16k for tests * reduce batch size for larger context len and udpate test to check message * fix test for message --------- Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-18 10:16:07 -05:00
Wing Lian	317fa2555a	fix bf16 check when preprocessing data (#1140 )	2024-01-17 22:41:23 -05:00
NanoCode012	1e56b88cde	fix(preprocess): Make sure dataset not loaded from cache when using preprocess cli (#1136 )	2024-01-18 03:03:52 +09:00
Wing Lian	7570446596	Preprocess dataset size fix (#1131 ) * overwrite cache on preprocess step * don't cache the TokenizedPromptDataset at all * load_from_cache_file no longer needed	2024-01-17 11:02:41 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00

1 2 3 4 5 ...

1243 Commits