Wing Lian
21d307b15b
fix counts by accounting for num devices
2023-08-08 04:13:10 -04:00
Wing Lian
58e9dee204
fixes and go back to distributed sampler since batch sampler won't work
2023-08-08 03:49:29 -04:00
Wing Lian
4f7c04bae0
more fixes and optimizations
2023-08-08 03:16:00 -04:00
Wing Lian
1162b93b6b
filter w multiple cpus
2023-08-08 00:50:56 -04:00
Wing Lian
21f445d763
more packing and dataset optimizations and fixes
2023-08-08 00:45:24 -04:00
Wing Lian
229b9165aa
fix test and pylint checks
2023-08-07 09:38:05 -04:00
Wing Lian
394a65f11f
add unit tests for cum seq lens, add ability to build cu_seq_lens from positional ids, fix prompt test
2023-08-07 09:38:04 -04:00
Wing Lian
c70dae63cc
add chatml
2023-08-07 09:38:04 -04:00
Wing Lian
7712955b35
fix chatml system prompt for openorca, legacy tokenizer opts
2023-08-07 09:38:04 -04:00
Wing Lian
f93f0017cd
fix flash-attn, xformers, packing, support chatml
2023-08-07 09:38:04 -04:00
Wing Lian
0b01da0713
properly calculate max len
2023-08-07 09:38:04 -04:00
Wing Lian
b2f7bc7ccd
use cumulative seq len with var len flash attn v2 w packing
2023-08-07 09:38:04 -04:00
Wing Lian
b8905e2a91
sample_packing_seq_len_multiplier config
2023-08-07 09:38:04 -04:00
Wing Lian
7e1edc662a
make sure the chunk size is an int
2023-08-07 09:38:04 -04:00
Wing Lian
98c9bc69de
seq_len_multiple for packing
2023-08-07 09:38:04 -04:00
Wing Lian
8378335dc9
limit packing to sequences of max seq len
2023-08-07 09:38:04 -04:00
Wing Lian
bdd34c7400
weighted CEL fixes
2023-08-07 09:38:04 -04:00
Wing Lian
c6cc54c7d9
weighted CE losses
2023-08-07 09:38:04 -04:00
Wing Lian
83f7362480
don't split batches when packing
2023-08-07 09:38:04 -04:00
Wing Lian
958d423e7c
only process eval dataset for packing if not None
2023-08-07 09:38:04 -04:00
Wing Lian
e74eab6e73
add a test for the mask expansion for sequence packing
2023-08-07 09:38:04 -04:00
Wing Lian
487abfc769
pass sample packing efficiency to training args
2023-08-07 09:38:04 -04:00
Wing Lian
2bee646e85
fix step calc for packing
2023-08-07 09:38:04 -04:00
Wing Lian
945f2e5029
better handling so that all devices have the same dataloader len
2023-08-07 09:38:04 -04:00
Wing Lian
daed942fe9
fix rounding of len of batches to int
2023-08-07 09:38:04 -04:00
Wing Lian
df3eb645da
better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
2023-08-07 09:38:04 -04:00
Wing Lian
32fed7039d
optimized expand mask fn
2023-08-07 09:38:04 -04:00
Wing Lian
7d7b5ebd71
more fixes for 4k and optimizations
2023-08-07 09:38:03 -04:00
Wing Lian
4b7ad9927f
validation for sample packing and doc
2023-08-07 09:38:03 -04:00
Wing Lian
fedcf5a089
Update src/axolotl/utils/dataloader.py
2023-08-07 09:38:03 -04:00
Wing Lian
2f2974196d
fix for position_ids w packing
2023-08-07 09:38:03 -04:00
Wing Lian
2e295c9f94
use accelerator prepare for dataloader
2023-08-07 09:38:03 -04:00
Wing Lian
4ab9ab79fd
use distributed sampler, avoid accelerate prepare
2023-08-07 09:38:03 -04:00
Wing Lian
b02484a83e
more fixes for sample packing
2023-08-07 09:38:03 -04:00
Wing Lian
58045f0816
more fixes, position_ids seems broken
2023-08-07 09:38:03 -04:00
Wing Lian
66774011c4
est total tokens, fix field loop
2023-08-07 09:38:03 -04:00
Wing Lian
41d4992029
more fixes for dataloader integration
2023-08-07 09:38:03 -04:00
Wing Lian
762f1b08db
add position_ids back
2023-08-07 09:38:03 -04:00
Wing Lian
3aba4c5d7c
use multi pack dataloader w random sampler
2023-08-07 09:38:03 -04:00
Wing Lian
ffd96839cf
don't move masks to cpu
2023-08-07 09:38:03 -04:00
Wing Lian
ef9bf7ad73
fix expand mask for multiple batch items, make sure we pad position_ids
2023-08-07 09:38:03 -04:00
Wing Lian
4964b0d345
set position ids and use block diagonal attn mask
2023-08-07 09:38:03 -04:00
Wing Lian
36b0e30a9d
fix attetion mask with packing
2023-08-07 09:38:03 -04:00
Wing Lian
176b888a63
ensure enable_input_require_grads is called on model before getting the peft model ( #345 )
2023-08-06 18:13:10 -04:00
Jan Philipp Harries
3392270544
experimental llama 2 chat support ( #296 )
...
* experimental llama 2 chat support
* few small fixes
* llama2_chat
* small fix to follow original implementation
* small fixes and added fixtures/tests
* fix -mixed up inference and finetuning conversations
* args - small fix
* small fix
* small adjustment and warning
* fix with pre-commit
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 17:40:52 -04:00
Wing Lian
bb53a165f5
add a basic ds zero3 config ( #347 )
...
better defaults for ds
2023-08-06 17:19:51 -04:00
ssmi153
10405b9995
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) ( #339 )
...
* Fix XFormers attention for Llama-2 70B (GQA)
Updated XFormers MonkeyPatch to handle GQA as used in Llama-2 70B. All the updated code is taken directly from the Transformers library: 07360b6c9c (diff-06392bad3b9e97be9ade60d4ac46f73b6809388f4d507c2ba1384ab872711c51) from their llama_modeling.py file.
* Catch configs without pretraining_tp
* Whitespace bug fix
Command had accidentally been moved out of if-else block.
* pre-commit formatting fixes
Thanks to @winglian
2023-08-06 11:09:04 -04:00
Jan Philipp Harries
c93655c0a3
Added Orca Mini prompt strategy ( #263 )
...
* added Orca Mini prompt strategy
* maybe this fixed precommit errors?
* pre-commits passing
---------
Co-authored-by: Jan Philipp Harries <jpdus@users.noreply.github.com >
2023-08-06 03:16:41 +09:00
Wing Lian
fe285430bc
optimize the iteration when tokenizeing large datasets ( #332 )
2023-08-04 12:12:05 -04:00
Aman Gupta Karmani
0d2e34f056
Merge pull request #336 from tmm1/flash-attn
...
Fix flash-attn + qlora not working with llama models
2023-08-03 16:25:30 -07:00