Wing Lian
|
6b5cf8b5ea
|
optimize length reducer from 9m -> <5sec
|
2023-08-11 08:30:30 -04:00 |
|
Wing Lian
|
79500f358a
|
need to pass total num tokens to trainer too
|
2023-08-10 19:08:23 -04:00 |
|
Wing Lian
|
7e977a9b68
|
optimization if total_num_tokens is already known
|
2023-08-10 19:02:28 -04:00 |
|
Wing Lian
|
ac4b700daa
|
optimization if total_num_tokens is already known
|
2023-08-10 19:01:17 -04:00 |
|
Wing Lian
|
2565c2f259
|
async batching for multipack
|
2023-08-10 18:28:15 -04:00 |
|
Wing Lian
|
a07f432d9c
|
calculate cum seq lens with pos_ids instead of mask, simplify packing params, fix distributed barrier
|
2023-08-10 17:16:01 -04:00 |
|
Wing Lian
|
57d9bf711c
|
let's not cleanup the cached datasets
|
2023-08-08 21:27:55 -04:00 |
|
Wing Lian
|
26983a1974
|
fix sampler to prevent overfit w new epochs
|
2023-08-08 15:34:18 -04:00 |
|
Wing Lian
|
1b8747e319
|
use custom distributed checks
|
2023-08-08 13:35:04 -04:00 |
|
Wing Lian
|
035b3c760c
|
add numba to requirements.
|
2023-08-08 10:55:29 -04:00 |
|
Wing Lian
|
17abbd59e1
|
previous accelerate is still most performant
|
2023-08-08 09:46:01 -04:00 |
|
Wing Lian
|
6ec76ddb4c
|
fix steps calculation
|
2023-08-08 05:13:21 -04:00 |
|
Wing Lian
|
21d307b15b
|
fix counts by accounting for num devices
|
2023-08-08 04:13:10 -04:00 |
|
Wing Lian
|
58e9dee204
|
fixes and go back to distributed sampler since batch sampler won't work
|
2023-08-08 03:49:29 -04:00 |
|
Wing Lian
|
4f7c04bae0
|
more fixes and optimizations
|
2023-08-08 03:16:00 -04:00 |
|
Wing Lian
|
1162b93b6b
|
filter w multiple cpus
|
2023-08-08 00:50:56 -04:00 |
|
Wing Lian
|
21f445d763
|
more packing and dataset optimizations and fixes
|
2023-08-08 00:45:24 -04:00 |
|
Wing Lian
|
229b9165aa
|
fix test and pylint checks
|
2023-08-07 09:38:05 -04:00 |
|
Wing Lian
|
394a65f11f
|
add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
c70dae63cc
|
add chatml
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
7712955b35
|
fix chatml system prompt for openorca, legacy tokenizer opts
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
f93f0017cd
|
fix flash-attn, xformers, packing, support chatml
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
0b01da0713
|
properly calculate max len
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
b2f7bc7ccd
|
use cumulative seq len with var len flash attn v2 w packing
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
b8905e2a91
|
sample_packing_seq_len_multiplier config
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
7e1edc662a
|
make sure the chunk size is an int
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
98c9bc69de
|
seq_len_multiple for packing
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
8378335dc9
|
limit packing to sequences of max seq len
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
bdd34c7400
|
weighted CEL fixes
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
c6cc54c7d9
|
weighted CE losses
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
83f7362480
|
don't split batches when packing
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
958d423e7c
|
only process eval dataset for packing if not None
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
e74eab6e73
|
add a test for the mask expansion for sequence packing
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
487abfc769
|
pass sample packing efficiency to training args
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
2bee646e85
|
fix step calc for packing
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
945f2e5029
|
better handling so that all devices have the same dataloader len
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
daed942fe9
|
fix rounding of len of batches to int
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
df3eb645da
|
better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
32fed7039d
|
optimized expand mask fn
|
2023-08-07 09:38:04 -04:00 |
|
Wing Lian
|
7d7b5ebd71
|
more fixes for 4k and optimizations
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
4b7ad9927f
|
validation for sample packing and doc
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
fedcf5a089
|
Update src/axolotl/utils/dataloader.py
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
2f2974196d
|
fix for position_ids w packing
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
2e295c9f94
|
use accelerator prepare for dataloader
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
4ab9ab79fd
|
use distributed sampler, avoid accelerate prepare
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
b02484a83e
|
more fixes for sample packing
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
58045f0816
|
more fixes, position_ids seems broken
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
66774011c4
|
est total tokens, fix field loop
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
41d4992029
|
more fixes for dataloader integration
|
2023-08-07 09:38:03 -04:00 |
|
Wing Lian
|
762f1b08db
|
add position_ids back
|
2023-08-07 09:38:03 -04:00 |
|