axolotl

Author	SHA1	Message	Date
NanoCode012	c2b64e4dcf	Feat: update doc (#1475 ) [skip ci] * feat: update doc contents * chore: move batch vs ga docs * feat: update lambdalabs instructions * fix: refactor dev instructions	2024-04-04 13:43:40 +09:00
James Melvin Ebenezer	cae608f587	Added pip install ninja to accelerate installation of flash-attn (#1461 ) * Added pip install ninja to accelerate installation of flash-attn * doc: cleanup	2024-04-02 17:36:41 +09:00
Hamel Husain	86b7d22f35	Reorganize Docs (#1468 )	2024-04-01 08:00:52 -07:00
Phuc Van Phan	324d59ea0d	docs: update link to docs of advance topic in README.md (#1437 )	2024-03-24 21:49:27 -07:00
NanoCode012	f1ebaa07c6	chore(config): refactor old mistral config (#1435 ) * chore(config): refactor old mistral config * chore: add link to colab on readme	2024-03-25 12:00:44 +09:00
Hamel Husain	629450cecd	Bootstrap Hosted Axolotl Docs w/Quarto (#1429 ) * precommit * mv styes.css * fix links	2024-03-21 22:28:36 -07:00
Wing Lian	dd449c5cd8	support galore once upstreamed into transformers (#1409 ) * support galore once upstreamed into transformers * update module name for llama in readme and fix typing for all linear * bump trl for deprecation fixes from newer transformers * include galore as an extra and install in docker image * fix optim_args type * fix optim_args * update dependencies for galore * add galore to cicd dockerfile	2024-03-19 09:26:35 -04:00
NanoCode012	40a88e8c4a	Feat: Add sharegpt multirole (#1137 ) * feat(prompt): support multiple roles for sharegpt * fix: add handling of empty role back * feat: rebased and allowed more dynamic roles via config * fix: variable * chore: update message * feat: add vicuna format * fix: JSON serializable error * fix: typing * fix: don't remap for unknown keys * fix: add roles to pydantic * feat: add test * chore: remove leftover print * chore: remove leftover comment * chore: remove print * fix: update test to use chatml	2024-03-19 20:51:49 +09:00
Seungduk Kim	43bdc5d3de	Add a config not to shuffle merged dataset (#1394 ) [skip ci] * Add a config not to shuffle merged dataset * Update README.md * Update src/axolotl/utils/config/models/input/v0_4_1/__init__.py Co-authored-by: Wing Lian <wing.lian@gmail.com> * invert the condition name * update README * info -> debug --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-03-19 20:51:00 +09:00
NanoCode012	b1e3e1b25f	fix(config): passing gradient_checkpoint_kwargs (#1412 ) * fix(config): change default use_reentrant to true * Update trainer_builder.py * fix: make sure to pass kwargs to enable checkpoint * chore: lint	2024-03-19 12:57:43 +09:00
jbl	e8c8ea64b3	Update README.md (#1418 ) Add Phorm AI Badge	2024-03-17 23:47:46 -04:00
NanoCode012	f083aed2c7	Fix(readme): Improve README QuickStart info (#1408 ) * Fix(readme): Improve README QuickStart info * chore: add to toc	2024-03-16 21:10:22 +09:00
NanoCode012	868c33954d	Feat(readme): Add instructions for Google GPU VM instances (#1410 )	2024-03-16 21:10:05 +09:00
Hamel Husain	8b12468230	Add QLoRA + FSDP Docs (#1403 ) * pre commit * Update fsdp_qlora.md	2024-03-14 11:04:51 -04:00
Wing Lian	638c2dafb5	JarvisLabs (#1372 ) * add Jarvis cloud gpu and sponsorship * whitespace	2024-03-07 10:47:32 -05:00
Hamel Husain	ed70a08348	add docs for `input_output` format (#1367 ) [skip ci] * add docs * add docs * run linter	2024-03-06 09:09:49 -05:00
Nicolas Rojas	37657473c8	Remove unsupported python version 3.9 from README (#1364 ) [skip ci]	2024-03-05 21:19:36 -05:00
Wing Lian	6b3b271925	fix for protected model_ namespace w pydantic (#1345 )	2024-02-28 15:07:49 -05:00
Maxime	0f6af36d50	Mps mistral lora (#1292 ) [skip ci] * Lora example for Mistral on MPS backend * Add some MPS documentation * Update examples/mistral/lora-mps.yml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update examples/mistral/lora-mps.yml Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * Update README.md --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-02-26 22:39:57 -05:00
JohanWork	d75653407c	ADD: push checkpoints to mlflow artifact registry (#1295 ) [skip ci] * Add checkpoint logging to mlflow artifact registry * clean up * Update README.md Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * update pydantic config from rebase --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-02-26 13:32:39 -05:00
NanoCode012	c6b01e0f4a	chore: update readme to be more clear (#1326 ) [skip ci]	2024-02-26 13:32:13 -05:00
Wing Lian	cc3cebfa70	Pydantic 2.x cfg (#1239 ) * WIP conversion to use pydantic for config validation * wip, more fields, add capabilities * wip * update pydantic validation to match existing tests * tweak requirements * setup deprecated paams pydantic model * more validations * wrap up rest of the validations * flesh out the rest of the options from the readme into pydantic * fix model validators as class methods remember to return in validator missing return add missing relora attributes fix test for DictDefault change fix sys template for mistral from fastchat change in PR 2872 fix test for batch size warning * more missing attributes for cfg * updates from PR feedback * fix validation for datasets and pretrain datasets * fix test for lora check	2024-02-26 12:24:14 -05:00
NanoCode012	2ed52bd568	fix(readme): Clarify doc for tokenizer_config (#1323 ) [skip ci]	2024-02-24 21:55:04 +09:00
NanoCode012	3d2cd804ae	fix(readme): update inference md link (#1311 ) [skip ci]	2024-02-22 02:48:06 +09:00
Leonardo Emili	5a5d47458d	Add seq2seq eval benchmark callback (#1274 ) * Add CausalLMBenchEvalCallback for measuring seq2seq performance * Fix code for pre-commit * Fix typing and improve logging * eval_sample_packing must be false with CausalLMBenchEvalCallback	2024-02-13 08:24:30 -08:00
김진원	8430db22e2	Scheduler implementation of Continual Pre-Training of Large Language Models: How to (re)warm your model? (#1273 )	2024-02-12 21:23:28 -08:00
Wing Lian	4b997c3e1a	allow the optimizer prune ratio for ReLoRA to be configurable (#1287 ) * allow the optimizer prune ration for relora to be configurable * update docs for relora * prevent circular imports	2024-02-12 11:39:51 -08:00
Hamel Husain	b2a4cb4396	Update README.md (#1281 )	2024-02-09 07:38:08 -08:00
Hamel Husain	9bca7db133	add support for https remote yamls (#1277 )	2024-02-08 20:02:17 -08:00
Hamel Husain	91cf4ee72c	allow remote data paths (#1278 ) * allow remote data paths * add docs about public url * only allow https * better docs * better docs	2024-02-08 15:02:35 -08:00
Wing Lian	1daecd161e	copy edits (#1276 )	2024-02-08 09:00:04 -05:00
Wing Lian	4a654b331e	Add link to axolotl cloud image on latitude (#1275 )	2024-02-08 08:50:11 -05:00
Wing Lian	411293bdca	contributor avatars (#1269 )	2024-02-07 07:09:01 -08:00
Wing Lian	dfd188502a	add contact info for dedicated support for axolotl [skip ci] (#1243 )	2024-02-01 12:59:07 -05:00
Wing Lian	00568c1539	support for true batches with multipack (#1230 ) * support for true batches with multipack * patch the map dataset fetcher to handle batches with packed indexes * patch 4d mask creation for sdp attention * better handling for BetterTransformer * patch general case for 4d mask * setup forward patch. WIP * fix patch file * support for multipack w/o flash attention for llama * cleanup * add warning about bf16 vs fp16 for multipack with sdpa * bugfixes * add 4d multipack tests, refactor patches * update tests and add warnings * fix e2e file check * skip sdpa test if not at least torch 2.1.1, update docs	2024-02-01 10:18:42 -05:00
DreamGenX	5787e1a23f	Fix and document test_datasets (#1228 ) * Make sure test_dataset are used and treat val_set_size. * Add test_datasets docs. * Apply suggestions from code review --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-31 06:48:57 -05:00
Wing Lian	4cb7900a56	Peft lotfq (#1222 ) * loftq support for lora * fix loftq check * update readme for loftq * readability cleanup * use peft main for loftq fixes, remove unnecessary special tokens * remove unused test from older deprecation	2024-01-28 18:50:08 -05:00
mhenrichsen	98b4762077	Feat/chatml add system message (#1117 ) * add system message to template * readme update * added code to register new system message * register chatml template for test --------- Co-authored-by: Mads Henrichsen <mads@BrbartiendeMads.lan> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-25 08:24:27 +01:00
Wing Lian	54d2ac155b	Mixtral fixes 20240124 (#1192 ) [skip ci] * mixtral nccl fixes * make sure to patch for z3	2024-01-24 14:59:57 -05:00
Wing Lian	b715cd549a	update docs [skip ci] (#1176 )	2024-01-23 11:14:52 -05:00
Tilemachos Chatzipapas	cc250391a0	Fine-Tuning Mistral-7b for Real-World Chatbot Applications Using Axolotl (Lora used) (#1155 ) * Mistral-7b finetune example using axolotl with code,config,data * Corrected the path for huggingface dataset * Update data.jsonl * chore: lint --------- Co-authored-by: twenty8th <twenty8th@users.noreply.github.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-23 07:32:21 -05:00
Ayush Singh	9135b9e2aa	Update README.md (#1169 ) [skip ci] Fix typo	2024-01-23 07:25:44 -05:00
Wing Lian	782b6a4216	set fp16 to false if bf16, update bf16: auto in example YAMLs (#1122 ) [skip ci] * set fp16 to false if bf16, update bf16: auto in example YAMLs * unset fp16 so that it fallsback properly if bf16 isn't available * Update README.md [skip-ci] Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> * test that bf16 disables fp16 --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2024-01-22 18:44:01 -05:00
Wing Lian	2ce5c0d68a	Deprecate max packed sequence len (#1141 )	2024-01-20 05:11:50 -05:00
NanoCode012	3db5f2fd17	feat(dataset): add config to keep processed dataset in memory (#1152 )	2024-01-20 13:19:28 +09:00
Joe Cummings	08b8ba09a5	Fix link for Minotaur model (#1146 ) [skip-ci]	2024-01-18 17:22:04 -05:00
Joe Cummings	1d70f24b50	Add shifted sparse attention (#973 ) [skip-ci] * Add s2_attn to hijack flash code * Refactor code to account for s2_attn * Add test for models utils * Add ``s2_attention`` option to llama configs * Add ``s2_attention`` option to README config * Format code to appease linter * chore: lint * Remove xpos and llama-landmark [bad merge] * add e2e smoke tests for shifted sparse attention * remove stray patch from merge * update yml with link to paper for s2_attention/longlora * fix assertion check for full fine tune * increase sequence len for tests and PR feedback updates * reduce context len to 16k for tests * reduce context len to 16k for tests * reduce batch size for larger context len and udpate test to check message * fix test for message --------- Co-authored-by: joecummings <jrcummings@devvm050.nha0.facebook.com> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2024-01-18 10:16:07 -05:00
Wing Lian	ece0211996	Agnostic cloud gpu docker image and Jupyter lab (#1097 )	2024-01-15 22:37:54 -05:00
xzuyn	8487b97cf3	Add `layers_to_transform` for `lora_config` (#1118 )	2024-01-15 21:29:55 -05:00
NanoCode012	9cd27b2f91	fix(readme): clarify custom user prompt [no-ci] (#1124 ) * fix(readme): clarify custom user prompt * chore: update example to show use case of setting field	2024-01-16 09:47:33 +09:00

1 2 3 4 5 ...

298 Commits