axolotl

Author	SHA1	Message	Date
NanoCode012	41ecb451c2	Feat(doc): Add max_steps to readme (#389 )	2023-08-15 00:34:22 +09:00
NanoCode012	73a0b6ead5	Feat(config): Add hub_strategy (#386 )	2023-08-14 07:12:55 -04:00
NanoCode012	729c299256	Feat(doc): Improve sharegpt doc (#378 ) * Feat(doc): Improve sharegpt doc * Fix typo	2023-08-14 00:36:00 +09:00
Wing Lian	2bb0b78975	Attention mask and position id fixes for packing (#285 ) * fix attetion mask with packing * set position ids and use block diagonal attn mask * fix expand mask for multiple batch items, make sure we pad position_ids * don't move masks to cpu * use multi pack dataloader w random sampler * add position_ids back * more fixes for dataloader integration * est total tokens, fix field loop * more fixes, position_ids seems broken * more fixes for sample packing * use distributed sampler, avoid accelerate prepare * use accelerator prepare for dataloader * fix for position_ids w packing * Update src/axolotl/utils/dataloader.py * validation for sample packing and doc * more fixes for 4k and optimizations * optimized expand mask fn * better handling of variance in multipack dataloader length and trainer hanging when it runs out of data * fix rounding of len of batches to int * better handling so that all devices have the same dataloader len * fix step calc for packing * pass sample packing efficiency to training args * add a test for the mask expansion for sequence packing * only process eval dataset for packing if not None * don't split batches when packing * weighted CE losses * weighted CEL fixes * limit packing to sequences of max seq len * seq_len_multiple for packing * make sure the chunk size is an int * sample_packing_seq_len_multiplier config * use cumulative seq len with var len flash attn v2 w packing * properly calculate max len * fix flash-attn, xformers, packing, support chatml * fix chatml system prompt for openorca, legacy tokenizer opts * add chatml * add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test * fix test and pylint checks * more packing and dataset optimizations and fixes * filter w multiple cpus * more fixes and optimizations * fixes and go back to distributed sampler since batch sampler won't work * fix counts by accounting for num devices * fix steps calculation * previous accelerate is still most performant * add numba to requirements. * use custom distributed checks * fix sampler to prevent overfit w new epochs * let's not cleanup the cached datasets * calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier * speed optimizations and set accelerate fsdp env vars * optimize dataset concatenation? * more optimizations for dataset handling * fix import for annotation * manual pre-commit fixes * another sum optimization and bug fix for calc steps * fix packing estimations * fix formatting * pylint problems * add back flash attention branch for handling unpacked sequences seperately * Address PR feedback * add optional sample packing config params to readme	2023-08-12 15:14:56 -04:00
Morgan McGuire	7019509daa	Add wandb_entity to wandb options, update example configs, update README (#361 ) * Update wandb_entity and add wandb descriptions * add wandb to config section * remove trailing whitespace for pre-commit hook * remove trailing whitespace for pre-commit hook --------- Co-authored-by: Morgan McGuire <morganmcguire@Morgans-MacBook-Pro.local> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2023-08-12 12:17:11 -04:00
NanoCode012	b5212068ac	Feat: Add rope scaling (#343 ) * Feat: Add rope scaling * fix: move rope config	2023-08-13 00:50:15 +09:00
NanoCode012	fae6ed8092	Update README.md on pretraining_dataset (#360 ) * Update README.md on pretraining_dataset * Fix message	2023-08-11 12:17:07 +09:00
NanoCode012	94d03c8402	Clarify pre-tokenize before multigpu (#359 )	2023-08-11 11:27:42 +09:00
Aman Karmani	b4d1d22782	note pattern when using groups	2023-08-07 16:18:42 -07:00
Aman Karmani	9f99104038	update comment for group_by_length	2023-08-07 01:04:56 -07:00
Aman Karmani	58d665943e	python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev	2023-08-03 16:47:25 +00:00
Aman Karmani	cc7e80026e	there is no configs folder	2023-08-03 16:31:37 +00:00
Wing Lian	41a4d15d43	update README for updated docker images (#328 ) * update README for updated docker images * update readme from pr feedback	2023-07-28 16:50:03 -04:00
Wing Lian	dcdec44347	Merge pull request #306 from ethanhs/xgen Add XGen info to README and example config	2023-07-22 04:10:18 -04:00
Wing Lian	1066751358	don't resize embeddings to multiples of 32x by default	2023-07-22 01:52:38 -04:00
Ethan Smith	38811434e6	Add XGen info to README and example config	2023-07-21 00:44:50 -07:00
NanoCode012	165907fddb	Fix(readme): Improve wording for push model	2023-07-21 11:28:35 +09:00
NanoCode012	b64f411849	fix(readme): remove accelerate config	2023-07-18 01:31:02 +09:00
Wing Lian	469c08c9ba	Merge pull request #279 from NanoCode012/feat/multi-gpu-readme Feat(readme): improve docs on multi-gpu	2023-07-16 16:08:37 -04:00
Charles Goddard	3cdd8e4122	Add dataset name to all yaml options in README	2023-07-15 13:17:37 -07:00
NanoCode012	cf5ae6b649	Feat(readme): improve docs on multi-gpu	2023-07-16 01:07:27 +09:00
Charles Goddard	46032a1a1f	Fix formatting mistake	2023-07-14 20:57:27 -07:00
Charles Goddard	8bba64258e	Add example of dataset with configuration name to README	2023-07-14 20:46:21 -07:00
NanoCode012	231031a0e1	Merge pull request #275 from NanoCode012/feat/safetensors Feat: Add save_safetensors	2023-07-14 23:07:26 +09:00
NanoCode012	5491278a79	Feat: Add save_safetensors	2023-07-14 13:21:47 +09:00
NanoCode012	896c1aebcf	Feat(docs): Add model_revision arg	2023-07-14 12:56:07 +09:00
NanoCode012	41da98b982	Fix for linter	2023-07-06 23:20:11 +09:00
NanoCode012	9e64f42e0f	Fix local path loading and custom strategy type	2023-07-06 23:08:09 +09:00
NanoCode012	e79c8e617e	Fix future deprecation push_to_hub_model_id	2023-07-03 12:44:29 +09:00
Wing Lian	78a1e1fa12	open orca support	2023-07-01 00:19:41 -04:00
NanoCode012	c146880a75	Update README.md	2023-06-30 11:33:53 +09:00
Wing Lian	47d601fa23	optionally define whether to use_fast tokenizer	2023-06-25 10:19:49 -04:00
Wing Lian	c969f0a9dc	add docs	2023-06-15 08:43:20 -04:00
Wing Lian	d7635b7148	hint to what AMP means	2023-06-15 02:06:27 -04:00
Wing Lian	88e17ffc50	add float16 docs and tweak typehints	2023-06-15 02:05:31 -04:00
Wing Lian	16bb6276a5	Merge pull request #92 from OpenAccess-AI-Collective/flash-optimum add support for opimum bettertransformers	2023-06-14 07:50:15 -04:00
NanoCode012	3513885f43	Fix sharegpt type	2023-06-14 01:10:58 +09:00
PocketDoc Labs	5ff547dc70	Update README.md to include a community showcase	2023-06-12 22:38:10 -07:00
mhenrichsen	34ae69989f	fix inference	2023-06-12 21:39:19 +02:00
Wing Lian	fd2c9814c9	Merge branch 'main' into flash-optimum	2023-06-12 13:12:15 -04:00
Wing Lian	74ef5cc083	Merge pull request #192 from OpenAccess-AI-Collective/sharegpt-custom-prompt misc fixes	2023-06-12 08:26:38 -04:00
NanoCode012	52cde69288	Fix config path after config moved	2023-06-12 17:06:15 +09:00
Wing Lian	aac4b7691e	add new sharegpt, refactor prompt so it can be customized later, add exception if no data is processed	2023-06-11 19:42:25 -04:00
NanoCode012	4cd1deeef2	Add save_steps and eval_steps to Readme	2023-06-12 02:44:46 +09:00
Wing Lian	336aa3fd48	gptq lora llama is obviously good	2023-06-11 11:05:29 -04:00
Wing Lian	d0d7eaa4f3	update openllama and clean up paths	2023-06-11 11:03:31 -04:00
Wing Lian	a6ebf57e82	fix table formatting	2023-06-11 10:55:32 -04:00
Wing Lian	280832cec2	more matrix updates	2023-06-11 10:52:36 -04:00
Wing Lian	a43bae9ff0	update the support matrix	2023-06-11 10:44:03 -04:00
Wing Lian	c4e4f8115c	pass a prompt in from stdin for inference	2023-06-10 15:07:40 -04:00

1 2 3

146 Commits