axolotl

Author	SHA1	Message	Date
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
Wing Lian	cdc71f73c8	update table for rwkv4 support, fix process count for dataset (#822 )	2023-11-04 23:45:44 -04:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
Aleksa Gordić	2e71ff03a6	Add docker advanced instruction to README (#792 )	2023-10-27 09:24:04 -04:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
NanoCode012	20aa4b57d2	chore(readme): Improve documentation on conversation field (#782 ) * chore(readme): Improve documentation on conversation field * fix: clarify where the option is	2023-10-24 12:52:32 +09:00
NanoCode012	afedc470bd	Fix: Cannot tokenize with bf16 and on cpu (#766 )	2023-10-23 01:32:26 +09:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Wing Lian	a21935f07a	add to docs (#703 )	2023-10-19 21:32:30 -04:00
Casper	e1b214c62b	Clarify custom format example (#729 ) * Clarify custom prompt format * Simplify format	2023-10-14 09:28:12 -04:00
Maxime	3bd9528390	add noisy embedding (#721 ) * add noisy embedding * fix format * Update README.md * Update README.md * linter issues * caseus fixes --------- Co-authored-by: Maxime <maxime@nope.no>	2023-10-13 10:00:42 -04:00
NanoCode012	5855dded3d	fix(doc): update default doc according to arg (#714 )	2023-10-10 21:51:56 +09:00
NanoCode012	11c48c5e03	fix(doc): Add note on inference w sample packing (#712 )	2023-10-10 21:08:17 +09:00
seungduk.kim.2304	77c84e02fd	Update README with some explanations (#700 ) * Update README with some explanations * revert commit-hook change * add more explanation about batch size and gradient accum * not use latex foromat * decorate * git hook again * Attach a link that explains about LoRA hyperparameters * update table of content * Explanation about lora_modules_to_save	2023-10-08 13:37:54 -04:00
Wing Lian	2642caedf2	refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662 )	2023-10-02 21:08:07 -04:00
Kyle Corbitt	9ec20777ba	Make dataset_processes configurable (#651 ) I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies. This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.	2023-09-29 00:22:22 -04:00
ich	590d6032fd	Fix bug when using pretokenized datasets (#652 ) * fix pretokenized datasets readme * check if dataset type is not set to handle pretokenized datasets	2023-09-28 22:54:10 -04:00
Wing Lian	409ca0f21c	add support for defined train split (#654 )	2023-09-28 20:14:14 -04:00
NanoCode012	eb41f76f92	Feat: Add example for Mistral (#644 ) * Feat: Add example for Mistral * chore: turn off flash * chore: add is_mistral_derived_model * chore: update following PR	2023-09-28 20:15:00 +09:00
Napuh	85b0be2ba7	Warn users to login to HuggingFace (#645 ) * added warning if user is not logged in HF * updated doc to suggest logging in to HF	2023-09-27 17:43:35 -04:00
Wing Lian	895f0a0723	skip some flash attn patches unless explicitly enabled (#643 ) * skip some flash attn patches if explicitly disabled * make the other patches optional	2023-09-27 12:11:07 -04:00
Wing Lian	e7d3e2dbb6	use fastchat conversations template (#578 ) * use fastchat conversations template * require fastchat (fschat) pip install * handle roles dynamically from conversation * tweak fastchat conversation with a monkeypatch to get individual turns * fix up so it works with multiple conversation styles, and don't strip the turns * fix sharegpt fixture now that we're using a more correct tokenization * use a new prompter and support fastchat conversation type * use sharegpt from prompt strategies now * update docs, add chatml template * add a newline after im_end token * ensure we correctly set system message * update per PR feedback to handle deprecated sharegpt types * don't add duplicate wandb req * make sharegpt fields configurable from yml * llama2 fixes * don't fail fatally when turns are improper	2023-09-27 12:10:45 -04:00
NanoCode012	19a600a8b8	Feat: Add support for upstream FA2 (#626 ) * Feat: Add support for upstream FA2 * chore: add is_falcon_derived_model: true to examples * chore: add config to readme for documentation * feat: add extra model types * fix: remove old falcon flash patch * chore: pin transformers and accelerate	2023-09-26 09:53:28 -04:00
Fernando Tarin Morales	5e5296a77c	Added quotes to the pip install -e command to fix an incompatibility with shells that do glob expansion like zsh (#632 )	2023-09-25 11:50:14 -04:00
NanoCode012	67b9888630	Feat(doc): Add eval_sample_packing to doc (#625 )	2023-09-23 13:11:27 +09:00
Wing Lian	c25ba7939b	update README w deepspeed info (#605 )	2023-09-22 00:15:52 -04:00
NanoCode012	00dce35fb2	Feat(data): Allow loading local csv and text (#594 ) * Feat(data): Allow loading local csv and text * chore: update readme for loading data	2023-09-17 11:32:27 -04:00
NanoCode012	3a2edc85c3	Feat(doc): Add features to doc (#583 )	2023-09-16 01:14:15 +09:00
Wing Lian	f7a22632d7	support custom field for completion from yml (#580 ) * support custom field for completion from yml * remove legacy completion check and add doc * update README docs	2023-09-15 07:48:21 -04:00
Wing Lian	a5a625f47e	update support matrix with btlm and phi (#579 )	2023-09-15 02:46:15 -04:00
Wing Lian	861cecac2a	refactor scripts/finetune.py into new cli modules (#550 ) * refactor scripts/finetune.py into new cli modules * continue to support scripts/finetune.py * update readme with updated cli commands * Update scripts/finetune.py Co-authored-by: NanoCode012 <kevinvong@rocketmail.com> --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-09-15 01:43:52 -04:00
Wing Lian	a4e1bb6606	let hf trainer handle torch compile (#516 ) * let hf trainer handle torch compile * remove torch compile checks, include option for backend * suppress torch errors to get further * require min torch version of 2.1.0 for torch compile to work --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2023-09-13 11:42:12 -04:00
Glavin Wiechert	5b67ea98a6	Add training callback to send predictions to WandB table (#521 ) * WIP Add training callback to send predictions to WandB table * WIP improve wandb table reporting callback * WIP improve wandb table reporting callback (cont) * Add VSCode launching for debugging * Add tiny llama example * WIP attempt to improve post-eval prediction generation for table * WIP attempt to improve post-eval prediction generation for table - part 2 * WIP batch generation * WIP attempt to handle sample_packing using position_ids for wandb prediction table * WIP add code for debugging * Fix sample_packing support for wandb prediction table * Clean up code for PR review * Add eval_table_size, eval_table_max_new_tokens configs & clean up code * Clean up PR, delete VSCode config, add tiny-llama example * Add eval_table_size, eval_table_max_new_tokens documentation. Fix linting/formatting	2023-09-13 09:51:08 -04:00
Wing Lian	9845c5e12d	document that packaging needs to be installed before flash-attn (#559 )	2023-09-12 12:18:30 -04:00
The Objective Dad	6d57f2f0f0	ergonomic update to optimizer config doc (#548 )	2023-09-11 12:35:45 -04:00
Wing Lian	34c0a86a11	update readme to point to direct link to runpod template, cleanup install instrucitons (#532 ) * update readme to point to direct link to runpod template, cleanup install instrucitons * default install flash-attn and auto-gptq now too * update readme w flash-attn extra * fix version in setup	2023-09-08 11:58:54 -04:00
The Objective Dad	5e2d8a42d9	Adding NCCL Timeout Guide (#536 ) * fixes NCCL_P2P_LEVEL=NVL #429 * adding more insights into verious values of NCCL_P2P_LEVEL	2023-09-08 11:57:47 -04:00
NanoCode012	f51c9c56c6	Fix(doc): Inform Windows users to use WSL/docker (#518 )	2023-09-01 00:08:21 -07:00
Jan Philipp Harries	396a7a74fc	Added advanced DDP args (#515 ) * add ddp_config * add advanced ddp config * add ddp_config * add advanced ddp config --------- Co-authored-by: Jan Philipp Harries <jphme@users.noreply.github.com>	2023-08-31 10:37:47 -07:00
Wing Lian	5ac3392075	support for datasets with multiple names (#480 ) * support for datasets with multiple names * update docs	2023-08-29 06:18:17 -07:00
NanoCode012	48c56470d0	Fix(doc): Clarify no amp to full yaml docs (#496 )	2023-08-29 06:17:37 -07:00
Birch-san	8e197f6fb4	pad_to_worst_case_seq_len boolean, for testing memory limits (#498 ) * pad_to_worst_case_seq_len boolean, for testing memory limits * remove collator_pad_to_longest option since it does nothing see docs: https://huggingface.co/docs/transformers/main_classes/data_collator#transformers.DataCollatorWithPadding.padding True and "longest" mean the same thing * rename to `pad_to_sequence_len, and ensure 64 alignment --------- Co-authored-by: Aman Karmani <aman@tmm1.net>	2023-08-28 18:47:16 -04:00
NanoCode012	ad8be435ad	Feat(doc): Update eval_steps doc (#487 )	2023-08-27 10:09:09 +09:00
Charles O. Goddard	bde3c5a478	ReLoRA implementation (with quantization) (#322 ) * Experimental ReLoRA (+qlora) implementation * Add CPU offload * Remove local config * Fix saving logic * Remove redundant assert * Fix logic errors * Move ReLoRA into its own trainer class with a method override to create the proper scheduler * Formatting & typing fixes * Use safe_serialization * Don't allow fsdp/deepspeed with ReLoRA * Fix cpu-offload logic, enable multi gpu * Document parameters and add comment * Fix merge issue * Smooth over some sharp edges * Implement resume from checkpoint for relora * Address review comments * Fix saving logic * Add necessary metadata to safetensors --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-23 23:07:18 -04:00
NanoCode012	55c23c7bcb	Fix(doc): Clarify config (#466 )	2023-08-23 11:56:01 -04:00
TearGosling	f4746507f6	feat: add Metharme prompt strategy (#446 ) * Add Metharme tokenizing strategy This strategy accounts for how the Metharme JSONLs are formatted as well as adds duplicated EOS tokens which can help trim model output length. I haven't gotten the chance to test this yet, and probably won't have the chance for quite a bit, so I'm committing this now. * Redo Metharme tokenizing strategy lol * fix: oops * Rearrange a conditional * chore: reformat code in accordance with linter * chore: Make lint not freak out * chore: fix lint --------- Co-authored-by: NanoCode012 <kevinvong@rocketmail.com>	2023-08-22 11:21:45 +09:00
NanoCode012	04a42b6db1	feat(docs): improve user customized prompts (#443 ) * feat(docs): improve user customized prompts * feat(doc): add custom pretokenized instructions * chore: clean old data folder * chore: add new line	2023-08-20 23:59:43 -04:00
NanoCode012	919f4cac90	feat(doc): add pillow to lambda instructions (#445 )	2023-08-20 23:59:23 -04:00
Wing Lian	d2e7f27240	support user defined prompters, pretokenized datasets in config, local parquet, local arrow files (#348 ) * support user defined prompters, pretokenized datasets in config, local parquet, local arrow files * fix user defined dataset types * fix for system prompts * fix tests * fix checks for parquet and arrow * aha moment that d.data_files isn't used * add documentation for ds_type to add support for parquet and arrow	2023-08-20 09:17:49 -04:00
Philpax	d21318dfb9	docs(readme): add `cd axolotl` (#440 )	2023-08-19 19:14:05 -04:00

1 2 3 4 5

208 Commits