axolotl

Author	SHA1	Message	Date
Wing Lian	b2f7bc7ccd	use cumulative seq len with var len flash attn v2 w packing	2023-08-07 09:38:04 -04:00
Wing Lian	b8905e2a91	sample_packing_seq_len_multiplier config	2023-08-07 09:38:04 -04:00
Wing Lian	7e1edc662a	make sure the chunk size is an int	2023-08-07 09:38:04 -04:00
Wing Lian	98c9bc69de	seq_len_multiple for packing	2023-08-07 09:38:04 -04:00
Wing Lian	8378335dc9	limit packing to sequences of max seq len	2023-08-07 09:38:04 -04:00
Wing Lian	bdd34c7400	weighted CEL fixes	2023-08-07 09:38:04 -04:00
Wing Lian	c6cc54c7d9	weighted CE losses	2023-08-07 09:38:04 -04:00
Wing Lian	83f7362480	don't split batches when packing	2023-08-07 09:38:04 -04:00
Wing Lian	958d423e7c	only process eval dataset for packing if not None	2023-08-07 09:38:04 -04:00
Wing Lian	e74eab6e73	add a test for the mask expansion for sequence packing	2023-08-07 09:38:04 -04:00
Wing Lian	487abfc769	pass sample packing efficiency to training args	2023-08-07 09:38:04 -04:00
Wing Lian	2bee646e85	fix step calc for packing	2023-08-07 09:38:04 -04:00
Wing Lian	945f2e5029	better handling so that all devices have the same dataloader len	2023-08-07 09:38:04 -04:00
Wing Lian	daed942fe9	fix rounding of len of batches to int	2023-08-07 09:38:04 -04:00
Wing Lian	df3eb645da	better handling of variance in multipack dataloader length and trainer hanging when it runs out of data	2023-08-07 09:38:04 -04:00
Wing Lian	32fed7039d	optimized expand mask fn	2023-08-07 09:38:04 -04:00
Wing Lian	7d7b5ebd71	more fixes for 4k and optimizations	2023-08-07 09:38:03 -04:00
Wing Lian	4b7ad9927f	validation for sample packing and doc	2023-08-07 09:38:03 -04:00
Wing Lian	fedcf5a089	Update src/axolotl/utils/dataloader.py	2023-08-07 09:38:03 -04:00
Wing Lian	2f2974196d	fix for position_ids w packing	2023-08-07 09:38:03 -04:00
Wing Lian	2e295c9f94	use accelerator prepare for dataloader	2023-08-07 09:38:03 -04:00
Wing Lian	4ab9ab79fd	use distributed sampler, avoid accelerate prepare	2023-08-07 09:38:03 -04:00
Wing Lian	b02484a83e	more fixes for sample packing	2023-08-07 09:38:03 -04:00
Wing Lian	58045f0816	more fixes, position_ids seems broken	2023-08-07 09:38:03 -04:00
Wing Lian	66774011c4	est total tokens, fix field loop	2023-08-07 09:38:03 -04:00
Wing Lian	41d4992029	more fixes for dataloader integration	2023-08-07 09:38:03 -04:00
Wing Lian	762f1b08db	add position_ids back	2023-08-07 09:38:03 -04:00
Wing Lian	3aba4c5d7c	use multi pack dataloader w random sampler	2023-08-07 09:38:03 -04:00
Wing Lian	ffd96839cf	don't move masks to cpu	2023-08-07 09:38:03 -04:00
Wing Lian	ef9bf7ad73	fix expand mask for multiple batch items, make sure we pad position_ids	2023-08-07 09:38:03 -04:00
Wing Lian	4964b0d345	set position ids and use block diagonal attn mask	2023-08-07 09:38:03 -04:00
Wing Lian	36b0e30a9d	fix attetion mask with packing	2023-08-07 09:38:03 -04:00
Wing Lian	176b888a63	ensure enable_input_require_grads is called on model before getting the peft model (#345 )	2023-08-06 18:13:10 -04:00
Jan Philipp Harries	3392270544	experimental llama 2 chat support (#296 ) * experimental llama 2 chat support * few small fixes * llama2_chat * small fix to follow original implementation * small fixes and added fixtures/tests * fix -mixed up inference and finetuning conversations * args - small fix * small fix * small adjustment and warning * fix with pre-commit --------- Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>	2023-08-06 17:40:52 -04:00
Wing Lian	bb53a165f5	add a basic ds zero3 config (#347 ) better defaults for ds	2023-08-06 17:19:51 -04:00
ssmi153	10405b9995	Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) (#339 ) * Fix XFormers attention for Llama-2 70B (GQA) Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: `07360b6c9c (diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51)` from their llama_modeling.py file. * Catch configs without pretraining_tp * Whitespace bug fix Command had accidentally been moved out of if-else block. * pre-commit formatting fixes Thanks to @winglian	2023-08-06 11:09:04 -04:00
Jan Philipp Harries	c93655c0a3	Added Orca Mini prompt strategy (#263 ) * added Orca Mini prompt strategy * maybe this fixed precommit errors? * pre-commits passing --------- Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>	2023-08-06 03:16:41 +09:00
Wing Lian	fe285430bc	optimize the iteration when tokenizeing large datasets (#332 )	2023-08-04 12:12:05 -04:00
Aman Gupta Karmani	0d2e34f056	Merge pull request #336 from tmm1/flash-attn Fix flash-attn + qlora not working with llama models	2023-08-03 16:25:30 -07:00
Aman Gupta Karmani	b56a6c0101	Merge pull request #337 from tmm1/readme-fix update README	2023-08-03 15:14:17 -07:00
Aman Karmani	2eda9e02a9	fix typo	2023-08-03 21:04:12 +00:00
Aman Karmani	78b9efb7f4	scope flash-attn+qlora fix correctly, scope to llama, add comment	2023-08-03 19:19:39 +00:00
Aman Karmani	312a9fad07	move flash-attn monkey patch alongside the others	2023-08-03 17:20:49 +00:00
Aman Karmani	58d665943e	python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev	2023-08-03 16:47:25 +00:00
Aman Karmani	cc7e80026e	there is no configs folder	2023-08-03 16:31:37 +00:00
mhenrichsen	dc71d8872a	feat/llama-2 examples (#319 ) * qlora llama-2 * qlora llama-2 * linting * readme * lora added * linting * change group_by_length * 13b fitting on 24gb * grouped lengths true * add pad token * change out dir --------- Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>	2023-08-03 19:22:48 +09:00
Aman Karmani	248bf90f89	ensure flash-attn fixes happen in both adapter/lora modes, and use torch_dtype	2023-08-02 20:15:03 +00:00
Wing Lian	77085ea24e	qlora w flash attention fixes (#333 )	2023-08-01 23:26:16 -04:00
Wing Lian	db2a3586f3	add peft install back since it doesn't get installed by setup.py (#331 )	2023-07-31 16:31:53 -04:00
Wing Lian	6c9a87c8ee	pin accelerate so it works with llama2 (#330 )	2023-07-30 22:20:06 -04:00

1 2 3 4 5 ...

762 Commits