axolotl

Author	SHA1	Message	Date
Wing Lian	4c834bf25d	cleanup verbosity a bit	2023-11-06 18:32:26 -05:00
Fabian Preiß	8056ecd30e	add deepspeed-kernels dependency for deepspeed>=0.12.0 (#827 )	2023-11-05 07:52:56 -05:00
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
Wing Lian	cdc71f73c8	update table for rwkv4 support, fix process count for dataset (#822 )	2023-11-04 23:45:44 -04:00
NanoCode012	6459ac7357	fix: pin autogptq (#818 )	2023-11-03 10:14:55 -04:00
Wing Lian	964d858da0	fix model parallel (#816 )	2023-11-02 21:34:22 -04:00
NanoCode012	10388a8daf	fix(tokenizer): update log order after update (#806 )	2023-10-31 13:21:20 +09:00
NanoCode012	9f7e8a971d	feat(doc): add dummyoptim faq fix (#802 )	2023-10-29 23:06:06 +09:00
NanoCode012	637ed095a0	fix(config): Set eos/bos to tokenizer if different (#801 ) * fix(config): Set eos/bos to tokenizer if different * chore: fix lint	2023-10-29 21:32:37 +09:00
Wing Lian	827ec3d274	refactor neft patch to be more re-usable similar to trl's impl (#796 )	2023-10-29 04:33:13 -04:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
MilesQLi	0800885e2f	Update to adapt to sharegpt datasets with "assistant" rather than "gp… (#774 ) * Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers. * use a strict option for hanedling incorrect turn data * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-27 22:00:16 -04:00
Teknium	d3193beac3	Fix Deepspeed Zero3 Config (#791 ) * Update zero3.json Take away CPU Offload by default (Slows things down horribly, better off reducing batchsize), and changes LR Scheduler to a properly decaying one * Update zero3.json fix something	2023-10-27 21:57:02 -04:00
Aleksa Gordić	2e71ff03a6	Add docker advanced instruction to README (#792 )	2023-10-27 09:24:04 -04:00
chanvichetvong	facc49f32b	GitBook: No commit message	2023-10-26 15:11:00 +00:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
Casper	05bd6f1122	Threaded MultipackDistributedDataloader with prefetched samples (#759 ) * Multithreading implementation [WIP] * Added benchmarking * 35% increased throughput * Memory pinning * Start threads in init * Correct print of samples * Sleep if queue is full * Remove pin_memory (worse) * Simplify logic to one thread * Remove benchmark * Use deque for constant speed * Formatting * Formatting * Formatting * Formatting * Rollback to use queue * Fix multi-epoch training * Add num epochs arg * Start thread in __iter__ * Formatting * Use is_alive correctly * Simplify loading thread	2023-10-26 07:49:52 +02:00
NanoCode012	20aa4b57d2	chore(readme): Improve documentation on conversation field (#782 ) * chore(readme): Improve documentation on conversation field * fix: clarify where the option is	2023-10-24 12:52:32 +09:00
NanoCode012	11d1d607db	chore: refactor truthy check and fix mypy (#780 )	2023-10-24 12:28:40 +09:00
Wing Lian	6c81c61bc4	refactor setup trainer so we can add more hooks (#773 ) * refactor setup trainer so we can add more hooks * Remove stray comma	2023-10-23 17:38:41 -04:00
Wing Lian	9b43e7ea15	disable eval table w sample packing in examples (#778 )	2023-10-23 09:18:44 -04:00
Wing Lian	2d8def68dc	simplify by removing duplicate base_model_config (#772 )	2023-10-23 01:42:38 -04:00
NanoCode012	44c9d0151a	Fix: Warn when fullfinetune without adapter (#770 )	2023-10-22 15:41:43 -04:00
Wing Lian	ca84cca2c0	convert exponential notation lr to floats (#771 )	2023-10-22 15:37:03 -04:00
Casper	32eeeb5b64	Hotfix for not saving correctly (#762 )	2023-10-22 13:22:32 -04:00
NanoCode012	afedc470bd	Fix: Cannot tokenize with bf16 and on cpu (#766 )	2023-10-23 01:32:26 +09:00
NanoCode012	9923b72649	Fix: eval table conflict with eval_sample_packing (#769 )	2023-10-23 01:18:12 +09:00
Wing Lian	21cf09b608	remove lora fused packing test (#758 )	2023-10-21 22:59:35 -04:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Wing Lian	a21935f07a	add to docs (#703 )	2023-10-19 21:32:30 -04:00
NanoCode012	8966a6f566	chore: bump transformers to v4.34.1 to fix tokenizer issue (#745 )	2023-10-19 20:18:22 -04:00
Motoki Wu	e4d1585c4e	Fix DeepSpeed Zero 3 Saving (#709 ) * Update train.py * add zero3 check * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-19 19:18:24 -04:00
Wing Lian	70157ccb8f	add a latest tag for regular axolotl image, cleanup extraneous print statement (#746 )	2023-10-19 12:28:29 -04:00
seungduk.kim.2304	3a99495b05	improve: Enhance code readability of prompt_tokenizers.py (#707 )	2023-10-19 08:12:17 -04:00
NanoCode012	440c3ab527	Fix(model): Linear detected and added to target module with rope linear (#738 ) * Fix(model): Linear detected and added to target module with rope linear * fix: exclude layer instead	2023-10-18 22:13:20 -04:00
Napuh	992d57f20a	catch ConnectionError when checking dataset from HuggingFace (#743 )	2023-10-18 22:11:54 -04:00
mhenrichsen	91a016f410	badge (#739 ) * badge * fixed text	2023-10-18 10:21:34 -04:00
Casper	a045db0214	Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732 ) * Implement Mistral FA + SWA + Sample Packing * Handle unbroadcastable tensor * chore: lint * Simplify _prepare_decoder_attention_mask * Uncomment window size * Upgrade flash-attn to minimum of 2.3.0 to support SWA * Add original condition to avoid error during inference * chore: lint * use torchscript to prevent oom * chore: pylint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-16 15:13:46 -04:00
Casper	e1b214c62b	Clarify custom format example (#729 ) * Clarify custom prompt format * Simplify format	2023-10-14 09:28:12 -04:00
Wing Lian	3553172e3c	fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention (#728 )	2023-10-14 09:27:07 -04:00
Wing Lian	7f2027d93f	tweak for xformers install w pytorch 2.1.0 (#727 )	2023-10-13 15:21:17 -04:00
Wing Lian	8d288a2ad4	workaround for installing xformers w torch 2.1.0 (#725 )	2023-10-13 11:19:30 -04:00
Wing Lian	f30afe4544	misc sharegpt fixes (#723 ) * support for sharegpt with assistant talking first, better masking of assistant token, allow remap of roles from dataset * invalid role is actually not possible * update tokenized fixture for corrected labels	2023-10-13 11:04:39 -04:00
Wing Lian	bfbdba8614	pin xformers >= 0.0.22 (#724 )	2023-10-13 10:27:56 -04:00
Maxime	3bd9528390	add noisy embedding (#721 ) * add noisy embedding * fix format * Update README.md * Update README.md * linter issues * caseus fixes --------- Co-authored-by: Maxime <maxime@nope.no>	2023-10-13 10:00:42 -04:00
Wing Lian	2aa1f71464	fix pytorch 2.1.0 build, add multipack docs (#722 )	2023-10-13 08:57:28 -04:00
Wing Lian	1c412c7e9d	improve handling of the prepared ds path and other cfg defaults (#701 )	2023-10-13 07:46:07 -04:00
Jan Philipp Harries	490923fb78	Save Axolotl config as WandB artifact (#716 )	2023-10-11 07:28:12 -04:00
NanoCode012	5855dded3d	fix(doc): update default doc according to arg (#714 )	2023-10-10 21:51:56 +09:00
atgctg	ace70b33c6	Fix: lowercase `True` values in config (#713 ) * Fix: lowercase `True` values in config * Fix: lowercase `True` values in config	2023-10-10 21:32:20 +09:00

1 2 3 4 5 ...

1047 Commits