axolotl

Author	SHA1	Message	Date
Wing Lian	6c81c61bc4	refactor setup trainer so we can add more hooks (#773 ) * refactor setup trainer so we can add more hooks * Remove stray comma	2023-10-23 17:38:41 -04:00
Wing Lian	2d8def68dc	simplify by removing duplicate base_model_config (#772 )	2023-10-23 01:42:38 -04:00
NanoCode012	44c9d0151a	Fix: Warn when fullfinetune without adapter (#770 )	2023-10-22 15:41:43 -04:00
Wing Lian	ca84cca2c0	convert exponential notation lr to floats (#771 )	2023-10-22 15:37:03 -04:00
Casper	32eeeb5b64	Hotfix for not saving correctly (#762 )	2023-10-22 13:22:32 -04:00
NanoCode012	9923b72649	Fix: eval table conflict with eval_sample_packing (#769 )	2023-10-23 01:18:12 +09:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Motoki Wu	e4d1585c4e	Fix DeepSpeed Zero 3 Saving (#709 ) * Update train.py * add zero3 check * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-19 19:18:24 -04:00
Wing Lian	70157ccb8f	add a latest tag for regular axolotl image, cleanup extraneous print statement (#746 )	2023-10-19 12:28:29 -04:00
seungduk.kim.2304	3a99495b05	improve: Enhance code readability of prompt_tokenizers.py (#707 )	2023-10-19 08:12:17 -04:00
NanoCode012	440c3ab527	Fix(model): Linear detected and added to target module with rope linear (#738 ) * Fix(model): Linear detected and added to target module with rope linear * fix: exclude layer instead	2023-10-18 22:13:20 -04:00
Napuh	992d57f20a	catch ConnectionError when checking dataset from HuggingFace (#743 )	2023-10-18 22:11:54 -04:00
Casper	a045db0214	Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732 ) * Implement Mistral FA + SWA + Sample Packing * Handle unbroadcastable tensor * chore: lint * Simplify _prepare_decoder_attention_mask * Uncomment window size * Upgrade flash-attn to minimum of 2.3.0 to support SWA * Add original condition to avoid error during inference * chore: lint * use torchscript to prevent oom * chore: pylint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-16 15:13:46 -04:00
Wing Lian	3553172e3c	fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention (#728 )	2023-10-14 09:27:07 -04:00
Wing Lian	f30afe4544	misc sharegpt fixes (#723 ) * support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset * invalid role is actually not possible * update tokenized fixture for corrected labels	2023-10-13 11:04:39 -04:00
Maxime	3bd9528390	add noisy embedding (#721 ) * add noisy embedding * fix format * Update README.md * Update README.md * linter issues * caseus fixes --------- Co-authored-by: Maxime <maxime@nope.no>	2023-10-13 10:00:42 -04:00
Wing Lian	1c412c7e9d	improve handling of the prepared ds path and other cfg defaults (#701 )	2023-10-13 07:46:07 -04:00
Jan Philipp Harries	490923fb78	Save Axolotl config as WandB artifact (#716 )	2023-10-11 07:28:12 -04:00
NanoCode012	669f1d052c	Fix: Higher vram usage for mistral and sample_packing (#691 ) * Fix: Higher vram usage for mistral and sample_packing * chore: update comment * chore: lint	2023-10-06 12:33:43 -04:00
Wing Lian	2d60ba3a6e	flash_attention + sample packing for stablelm 3b (#671 ) * stablelm epoch fa patch * is causal for fa * working stablelm fa w packing * chore: pre-commit linting	2023-10-05 16:03:43 -04:00
NanoCode012	eb480dfd68	Fix: ValueError when FA + Mistral when padding_side=right (#681 ) * Fix: ValueError when FA + Mistral when padding_side=right * fix: remove tokenizer class check	2023-10-06 04:12:54 +09:00
NanoCode012	69fac9a020	Fix: Future deprecation warning with use_auth_token (#680 )	2023-10-06 03:56:18 +09:00
NanoCode012	e0b7eeabfd	Fix(tokenizer): Set rstrip,lstrip,norm to False (#678 )	2023-10-06 03:50:49 +09:00
NanoCode012	e62d5901b5	chore: Clean up repetitive model kwargs (#670 )	2023-10-04 20:41:26 +09:00
NanoCode012	697c50d408	Feat: Allow usage of native Mistral FA when no sample_packing (#669 ) * Allow usage of native Mistral FA when no sample_packing * fix: do not apply custom patch when sample_pack off * chore: lint * chore: pin transformer to v4.35.0.dev0 * fix: split sample_packing to separate test	2023-10-04 20:40:47 +09:00
Wing Lian	2642caedf2	refactor to set eval_batch_size earlier if unset, so we can warn if mismatched (#662 )	2023-10-02 21:08:07 -04:00
Wing Lian	f34648c8b9	remove patch fix for phi (#664 )	2023-10-02 21:07:41 -04:00
Wing Lian	e50a64e85e	prepared dataset caching, other misc fixes (#665 ) * prepared dataset caching, other misc fixes * also don't load from disk cache unless explicit	2023-10-02 21:07:24 -04:00
Kyle Corbitt	9ec20777ba	Make dataset_processes configurable (#651 ) I'm using the Axolotl script to train models on https://modal.com serverless GPUs. Unfortunately, their environment seems to have some kind of bug where if I try to run `datasets.filter` with too high a `num_proc`, it throws an error and dies. This PR adds a new configuration option `dataset_processes`, which lets you explicitly set the number of processes used to map/filter the dataset. If not included, this defaults to the current behavior of setting that to `os.cpu_count()`.	2023-09-29 00:22:22 -04:00
ich	590d6032fd	Fix bug when using pretokenized datasets (#652 ) * fix pretokenized datasets readme * check if dataset type is not set to handle pretokenized datasets	2023-09-28 22:54:10 -04:00
Wing Lian	409ca0f21c	add support for defined train split (#654 )	2023-09-28 20:14:14 -04:00
Wing Lian	8662e8ffe8	don't strip the prompt for check since we don't strip to tokenize anymore (#650 )	2023-09-28 12:21:51 -04:00
Wing Lian	b2edaaeff6	fix for flash attn w mistral w/o sammple packing (#648 )	2023-09-28 10:57:37 -04:00
NanoCode012	eb41f76f92	Feat: Add example for Mistral (#644 ) * Feat: Add example for Mistral * chore: turn off flash * chore: add is_mistral_derived_model * chore: update following PR	2023-09-28 20:15:00 +09:00
NanoCode012	383f88d7a7	Fix(cfg): Add validation for save_strategy and eval_strategy (#633 ) * Fix(cfg): Check save_strategy cfg conflict with save_steps * Fix(cfg): Check evaluation_strategy cfg conflict with eval_steps * chore: add extra check for steps only	2023-09-28 10:14:41 +09:00
Wing Lian	b6ab8aad62	Mistral flash attn packing (#646 ) * add mistral monkeypatch * add arg for decoder attention masl * fix lint for duplicate code * make sure to update transformers too * tweak install for e2e * move mistral patch to conditional	2023-09-27 18:41:00 -04:00
Napuh	85b0be2ba7	Warn users to login to HuggingFace (#645 ) * added warning if user is not logged in HF * updated doc to suggest logging in to HF	2023-09-27 17:43:35 -04:00
Ethan Smith	8fe0e633d2	Fix bug in dataset loading (#284 ) * Fix bug in dataset loading This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download` * Check type of data_files, and load accordingly	2023-09-27 13:41:31 -04:00
Felix Yan	d1236f2c41	Correct typos in datasets.py (#639 )	2023-09-27 12:12:10 -04:00
Wing Lian	895f0a0723	skip some flash attn patches unless explicitly enabled (#643 ) * skip some flash attn patches if explicitly disabled * make the other patches optional	2023-09-27 12:11:07 -04:00
Wing Lian	e7d3e2dbb6	use fastchat conversations template (#578 ) * use fastchat conversations template * require fastchat (fschat) pip install * handle roles dynamically from conversation * tweak fastchat conversation with a monkeypatch to get individual turns * fix up so it works with multiple conversation styles, and don't strip the turns * fix sharegpt fixture now that we're using a more correct tokenization * use a new prompter and support fastchat conversation type * use sharegpt from prompt strategies now * update docs, add chatml template * add a newline after im_end token * ensure we correctly set system message * update per PR feedback to handle deprecated sharegpt types * don't add duplicate wandb req * make sharegpt fields configurable from yml * llama2 fixes * don't fail fatally when turns are improper	2023-09-27 12:10:45 -04:00
Wing Lian	60c7c48c97	update for recent transformers updates (#636 ) * update for recent transformers updates * fix checkpoint forward kwargs * just pass args into torch checkpoint	2023-09-27 12:10:32 -04:00
Wing Lian	e8cbf50be6	attention_mask not needed for training (#642 ) * attention_mask not needed for training * specifically don't use attention mask for phi * use a different check for phi * small fixes since phi removed some values from their config	2023-09-27 11:12:08 -04:00
NanoCode012	19a600a8b8	Feat: Add support for upstream FA2 (#626 ) * Feat: Add support for upstream FA2 * chore: add is_falcon_derived_model: true to examples * chore: add config to readme for documentation * feat: add extra model types * fix: remove old falcon flash patch * chore: pin transformers and accelerate	2023-09-26 09:53:28 -04:00
NanoCode012	cfbce020e9	Fix: Fail bf16 check when running on cpu during merge (#631 )	2023-09-25 13:48:18 +09:00
Wing Lian	a363604dcf	better handling and logging of empty sharegpt turns (#603 )	2023-09-22 16:13:42 -04:00
Wing Lian	501958bb6f	create a model card with axolotl badge (#624 )	2023-09-22 16:13:26 -04:00
NanoCode012	d5f8589021	chore(callback): Remove old peft saving code (#510 )	2023-09-22 12:31:33 +09:00
Wing Lian	03e59077a0	misc fixes to add gptq tests (#621 ) * misc fixes to add gptq tests * set bf16 needed for fa2	2023-09-21 21:52:12 -04:00
Wing Lian	97d3776ce6	split completion text to sequence_len (#616 )	2023-09-21 21:51:25 -04:00

... 6 7 8 9 10 ...

807 Commits