axolotl

Author	SHA1	Message	Date
Wing Lian	1a6309c8a6	cleanup the old multipack dataloader (#841 )	2023-11-12 05:39:09 -05:00
Bryan Thornbury	105d0b350b	Pin optimum package (#838 )	2023-11-09 22:36:15 -05:00
Wing Lian	f544ab2bed	don't compile deepspeed or bitsandbytes from source (#837 )	2023-11-08 19:49:55 -05:00
Wing Lian	641e6f7e51	multipack w batch sampler (#795 ) * test batch sampler w varying batch lens * wip * multipack batchsampler wip * wip * fix for prepare data loader to get correct # of steps based on gpues * lint and clean up * calculate len estimate * fix total num steps calc * add options for dataloader_num_workers and dataloader_pin_memory * remove gitbook * support prefetch_factor for dataloader optimization * fix the kwarg	2023-11-07 20:27:40 -05:00
Wing Lian	6dc68a653f	use temp_dir kwarg instead	2023-11-06 18:33:01 -05:00
Wing Lian	7de6a5639c	missing dunder-init	2023-11-06 18:33:01 -05:00
Wing Lian	c74f045ba7	chore: lint	2023-11-06 18:33:01 -05:00
Wing Lian	0402d19759	make sure to cleanup tmp output_dir for e2e tests	2023-11-06 18:33:01 -05:00
Wing Lian	b2430ce670	use accelerate logging for zero/main loggin only	2023-11-06 18:32:26 -05:00
Wing Lian	4c834bf25d	cleanup verbosity a bit	2023-11-06 18:32:26 -05:00
Fabian Preiß	8056ecd30e	add deepspeed-kernels dependency for deepspeed>=0.12.0 (#827 )	2023-11-05 07:52:56 -05:00
Jason Stillerman	738a057674	Feat: Added Gradio support (#812 ) * Added gradio support * queuing and title * pre-commit run	2023-11-04 23:59:22 -04:00
Wing Lian	cdc71f73c8	update table for rwkv4 support, fix process count for dataset (#822 )	2023-11-04 23:45:44 -04:00
NanoCode012	6459ac7357	fix: pin autogptq (#818 )	2023-11-03 10:14:55 -04:00
Wing Lian	964d858da0	fix model parallel (#816 )	2023-11-02 21:34:22 -04:00
NanoCode012	10388a8daf	fix(tokenizer): update log order after update (#806 )	2023-10-31 13:21:20 +09:00
NanoCode012	9f7e8a971d	feat(doc): add dummyoptim faq fix (#802 )	2023-10-29 23:06:06 +09:00
NanoCode012	637ed095a0	fix(config): Set eos/bos to tokenizer if different (#801 ) * fix(config): Set eos/bos to tokenizer if different * chore: fix lint	2023-10-29 21:32:37 +09:00
Wing Lian	827ec3d274	refactor neft patch to be more re-usable similar to trl's impl (#796 )	2023-10-29 04:33:13 -04:00
Wing Lian	8b79ff0e94	fix eval_steps to be a sane default (#797 ) * fix eval_steps to be a sane default * update docs for fractional eval_steps	2023-10-27 22:36:30 -04:00
MilesQLi	0800885e2f	Update to adapt to sharegpt datasets with "assistant" rather than "gp… (#774 ) * Update to adapt to sharegpt datasets with "assistant" rather than "gpt" as the machine answers. * use a strict option for hanedling incorrect turn data * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-27 22:00:16 -04:00
Teknium	d3193beac3	Fix Deepspeed Zero3 Config (#791 ) * Update zero3.json Take away CPU Offload by default (Slows things down horribly, better off reducing batchsize), and changes LR Scheduler to a properly decaying one * Update zero3.json fix something	2023-10-27 21:57:02 -04:00
Aleksa Gordić	2e71ff03a6	Add docker advanced instruction to README (#792 )	2023-10-27 09:24:04 -04:00
chanvichetvong	facc49f32b	GitBook: No commit message	2023-10-26 15:11:00 +00:00
Casper	e50ab072e2	Create preprocess CLI (#785 ) * Create preprocess CLI * Print prompt template if debugging * Add print for unsupported prompters * Formatting * Formatting * Refactor variables * Formatting * Formatting * Formatting * Formatting	2023-10-26 09:35:42 -04:00
Casper	05bd6f1122	Threaded MultipackDistributedDataloader with prefetched samples (#759 ) * Multithreading implementation [WIP] * Added benchmarking * 35% increased throughput * Memory pinning * Start threads in init * Correct print of samples * Sleep if queue is full * Remove pin_memory (worse) * Simplify logic to one thread * Remove benchmark * Use deque for constant speed * Formatting * Formatting * Formatting * Formatting * Rollback to use queue * Fix multi-epoch training * Add num epochs arg * Start thread in __iter__ * Formatting * Use is_alive correctly * Simplify loading thread	2023-10-26 07:49:52 +02:00
NanoCode012	20aa4b57d2	chore(readme): Improve documentation on conversation field (#782 ) * chore(readme): Improve documentation on conversation field * fix: clarify where the option is	2023-10-24 12:52:32 +09:00
NanoCode012	11d1d607db	chore: refactor truthy check and fix mypy (#780 )	2023-10-24 12:28:40 +09:00
Wing Lian	6c81c61bc4	refactor setup trainer so we can add more hooks (#773 ) * refactor setup trainer so we can add more hooks * Remove stray comma	2023-10-23 17:38:41 -04:00
Wing Lian	9b43e7ea15	disable eval table w sample packing in examples (#778 )	2023-10-23 09:18:44 -04:00
Wing Lian	2d8def68dc	simplify by removing duplicate base_model_config (#772 )	2023-10-23 01:42:38 -04:00
NanoCode012	44c9d0151a	Fix: Warn when fullfinetune without adapter (#770 )	2023-10-22 15:41:43 -04:00
Wing Lian	ca84cca2c0	convert exponential notation lr to floats (#771 )	2023-10-22 15:37:03 -04:00
Casper	32eeeb5b64	Hotfix for not saving correctly (#762 )	2023-10-22 13:22:32 -04:00
NanoCode012	afedc470bd	Fix: Cannot tokenize with bf16 and on cpu (#766 )	2023-10-23 01:32:26 +09:00
NanoCode012	9923b72649	Fix: eval table conflict with eval_sample_packing (#769 )	2023-10-23 01:18:12 +09:00
Wing Lian	21cf09b608	remove lora fused packing test (#758 )	2023-10-21 22:59:35 -04:00
Casper	15d3a654bf	Implement fused modules (#747 ) * MLP: Memory saving * Remove RMSNorm restrictions * Map packed weights to original * FusedAttention module * Simplify code * Move fused modules * Fix critical typo * Split inplace * Add FFT config * Add validation of fused arguments * Add fused arguments to config * Update docs * Fix validation logic * Add fused modules to flash attn * Only fuse during training * Remove timing * Formatting * Formatting * Formatting * chore: lint * chore: lint * add e2e tests for fused llama * no lora for tests --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-21 16:08:25 -04:00
Wing Lian	a21935f07a	add to docs (#703 )	2023-10-19 21:32:30 -04:00
NanoCode012	8966a6f566	chore: bump transformers to v4.34.1 to fix tokenizer issue (#745 )	2023-10-19 20:18:22 -04:00
Motoki Wu	e4d1585c4e	Fix DeepSpeed Zero 3 Saving (#709 ) * Update train.py * add zero3 check * chore: lint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-19 19:18:24 -04:00
Wing Lian	70157ccb8f	add a latest tag for regular axolotl image, cleanup extraneous print statement (#746 )	2023-10-19 12:28:29 -04:00
seungduk.kim.2304	3a99495b05	improve: Enhance code readability of prompt_tokenizers.py (#707 )	2023-10-19 08:12:17 -04:00
NanoCode012	440c3ab527	Fix(model): Linear detected and added to target module with rope linear (#738 ) * Fix(model): Linear detected and added to target module with rope linear * fix: exclude layer instead	2023-10-18 22:13:20 -04:00
Napuh	992d57f20a	catch ConnectionError when checking dataset from HuggingFace (#743 )	2023-10-18 22:11:54 -04:00
mhenrichsen	91a016f410	badge (#739 ) * badge * fixed text	2023-10-18 10:21:34 -04:00
Casper	a045db0214	Mistral: Sliding Window Attention with Flash Attention and Sample Packing (#732 ) * Implement Mistral FA + SWA + Sample Packing * Handle unbroadcastable tensor * chore: lint * Simplify _prepare_decoder_attention_mask * Uncomment window size * Upgrade flash-attn to minimum of 2.3.0 to support SWA * Add original condition to avoid error during inference * chore: lint * use torchscript to prevent oom * chore: pylint --------- Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-10-16 15:13:46 -04:00
Casper	e1b214c62b	Clarify custom format example (#729 ) * Clarify custom prompt format * Simplify format	2023-10-14 09:28:12 -04:00
Wing Lian	3553172e3c	fixes for alpaca w chatml, and don't include attention_mask w mistral for flash attention (#728 )	2023-10-14 09:27:07 -04:00
Wing Lian	7f2027d93f	tweak for xformers install w pytorch 2.1.0 (#727 )	2023-10-13 15:21:17 -04:00

1 2 3 4 5 ...

1056 Commits