Commit Graph

762 Commits

Author SHA1 Message Date
Wing Lian
b2f7bc7ccd use cumulative seq len with var len flash attn v2 w packing 2023-08-07 09:38:04 -04:00
Wing Lian
b8905e2a91 sample_packing_seq_len_multiplier config 2023-08-07 09:38:04 -04:00
Wing Lian
7e1edc662a make sure the chunk size is an int 2023-08-07 09:38:04 -04:00
Wing Lian
98c9bc69de seq_len_multiple for packing 2023-08-07 09:38:04 -04:00
Wing Lian
8378335dc9 limit packing to sequences of max seq len 2023-08-07 09:38:04 -04:00
Wing Lian
bdd34c7400 weighted CEL fixes 2023-08-07 09:38:04 -04:00
Wing Lian
c6cc54c7d9 weighted CE losses 2023-08-07 09:38:04 -04:00
Wing Lian
83f7362480 don't split batches when packing 2023-08-07 09:38:04 -04:00
Wing Lian
958d423e7c only process eval dataset for packing if not None 2023-08-07 09:38:04 -04:00
Wing Lian
e74eab6e73 add a test for the mask expansion for sequence packing 2023-08-07 09:38:04 -04:00
Wing Lian
487abfc769 pass sample packing efficiency to training args 2023-08-07 09:38:04 -04:00
Wing Lian
2bee646e85 fix step calc for packing 2023-08-07 09:38:04 -04:00
Wing Lian
945f2e5029 better handling so that all devices have the same dataloader len 2023-08-07 09:38:04 -04:00
Wing Lian
daed942fe9 fix rounding of len of batches to int 2023-08-07 09:38:04 -04:00
Wing Lian
df3eb645da better handling of variance in multipack dataloader length and trainer hanging when it runs out of data 2023-08-07 09:38:04 -04:00
Wing Lian
32fed7039d optimized expand mask fn 2023-08-07 09:38:04 -04:00
Wing Lian
7d7b5ebd71 more fixes for 4k and optimizations 2023-08-07 09:38:03 -04:00
Wing Lian
4b7ad9927f validation for sample packing and doc 2023-08-07 09:38:03 -04:00
Wing Lian
fedcf5a089 Update src/axolotl/utils/dataloader.py 2023-08-07 09:38:03 -04:00
Wing Lian
2f2974196d fix for position_ids w packing 2023-08-07 09:38:03 -04:00
Wing Lian
2e295c9f94 use accelerator prepare for dataloader 2023-08-07 09:38:03 -04:00
Wing Lian
4ab9ab79fd use distributed sampler, avoid accelerate prepare 2023-08-07 09:38:03 -04:00
Wing Lian
b02484a83e more fixes for sample packing 2023-08-07 09:38:03 -04:00
Wing Lian
58045f0816 more fixes, position_ids seems broken 2023-08-07 09:38:03 -04:00
Wing Lian
66774011c4 est total tokens, fix field loop 2023-08-07 09:38:03 -04:00
Wing Lian
41d4992029 more fixes for dataloader integration 2023-08-07 09:38:03 -04:00
Wing Lian
762f1b08db add position_ids back 2023-08-07 09:38:03 -04:00
Wing Lian
3aba4c5d7c use multi pack dataloader w random sampler 2023-08-07 09:38:03 -04:00
Wing Lian
ffd96839cf don't move masks to cpu 2023-08-07 09:38:03 -04:00
Wing Lian
ef9bf7ad73 fix expand mask for multiple batch items, make sure we pad position_ids 2023-08-07 09:38:03 -04:00
Wing Lian
4964b0d345 set position ids and use block diagonal attn mask 2023-08-07 09:38:03 -04:00
Wing Lian
36b0e30a9d fix attetion mask with packing 2023-08-07 09:38:03 -04:00
Wing Lian
176b888a63 ensure enable_input_require_grads is called on model before getting the peft model (#345) 2023-08-06 18:13:10 -04:00
Jan Philipp Harries
3392270544 experimental llama 2 chat support (#296)
* experimental llama 2 chat support

* few small fixes

* llama2_chat

* small fix to follow original implementation

* small fixes and added fixtures/tests

* fix -mixed up inference and finetuning conversations

* args - small fix

* small fix

* small adjustment and warning

* fix with pre-commit

---------

Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>
2023-08-06 17:40:52 -04:00
Wing Lian
bb53a165f5 add a basic ds zero3 config (#347)
better defaults for ds
2023-08-06 17:19:51 -04:00
ssmi153
10405b9995 Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) (#339)
* Fix XFormers attention for Llama-2 70B (GQA)

Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: 07360b6c9c (diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51) from their llama_modeling.py file.

* Catch configs without pretraining_tp

* Whitespace bug fix

Command had accidentally been moved out of if-else block.

* pre-commit formatting fixes

Thanks to @winglian
2023-08-06 11:09:04 -04:00
Jan Philipp Harries
c93655c0a3 Added Orca Mini prompt strategy (#263)
* added Orca Mini prompt strategy

* maybe this fixed precommit errors?

* pre-commits passing

---------

Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com>
2023-08-06 03:16:41 +09:00
Wing Lian
fe285430bc optimize the iteration when tokenizeing large datasets (#332) 2023-08-04 12:12:05 -04:00
Aman Gupta Karmani
0d2e34f056 Merge pull request #336 from tmm1/flash-attn
Fix flash-attn + qlora not working with llama models
2023-08-03 16:25:30 -07:00
Aman Gupta Karmani
b56a6c0101 Merge pull request #337 from tmm1/readme-fix
update README
2023-08-03 15:14:17 -07:00
Aman Karmani
2eda9e02a9 fix typo 2023-08-03 21:04:12 +00:00
Aman Karmani
78b9efb7f4 scope flash-attn+qlora fix correctly, scope to llama, add comment 2023-08-03 19:19:39 +00:00
Aman Karmani
312a9fad07 move flash-attn monkey patch alongside the others 2023-08-03 17:20:49 +00:00
Aman Karmani
58d665943e python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev 2023-08-03 16:47:25 +00:00
Aman Karmani
cc7e80026e there is no configs folder 2023-08-03 16:31:37 +00:00
mhenrichsen
dc71d8872a feat/llama-2 examples (#319)
* qlora llama-2

* qlora llama-2

* linting

* readme

* lora added

* linting

* change group_by_length

* 13b fitting on 24gb

* grouped lengths true

* add pad token

* change out dir

---------

Co-authored-by: Mads Henrichsen <mads@Brbar-tilhrende-Mads.local>
2023-08-03 19:22:48 +09:00
Aman Karmani
248bf90f89 ensure flash-attn fixes happen in both adapter/lora modes, and use torch_dtype 2023-08-02 20:15:03 +00:00
Wing Lian
77085ea24e qlora w flash attention fixes (#333) 2023-08-01 23:26:16 -04:00
Wing Lian
db2a3586f3 add peft install back since it doesn't get installed by setup.py (#331) 2023-07-31 16:31:53 -04:00
Wing Lian
6c9a87c8ee pin accelerate so it works with llama2 (#330) 2023-07-30 22:20:06 -04:00