This website requires JavaScript.
945f2e5029
better handling so that all devices have the same dataloader len
Wing Lian
2023-07-25 22:18:34 -04:00
daed942fe9
fix rounding of len of batches to int
Wing Lian
2023-07-25 10:29:49 -04:00
df3eb645da
better handling of variance in multipack dataloader length and trainer hanging when it runs out of data
Wing Lian
2023-07-25 10:22:05 -04:00
32fed7039d
optimized expand mask fn
Wing Lian
2023-07-24 17:11:02 -04:00
7d7b5ebd71
more fixes for 4k and optimizations
Wing Lian
2023-07-23 23:05:02 -04:00
4b7ad9927f
validation for sample packing and doc
Wing Lian
2023-07-22 03:35:06 -04:00
fedcf5a089
Update src/axolotl/utils/dataloader.py
Wing Lian
2023-07-22 03:11:20 -04:00
2f2974196d
fix for position_ids w packing
Wing Lian
2023-07-21 20:31:54 -04:00
2e295c9f94
use accelerator prepare for dataloader
Wing Lian
2023-07-19 22:58:16 -04:00
4ab9ab79fd
use distributed sampler, avoid accelerate prepare
Wing Lian
2023-07-19 12:16:19 -04:00
b02484a83e
more fixes for sample packing
Wing Lian
2023-07-18 22:27:37 -04:00
58045f0816
more fixes, position_ids seems broken
Wing Lian
2023-07-18 16:47:08 -04:00
66774011c4
est total tokens, fix field loop
Wing Lian
2023-07-18 11:30:07 -04:00
41d4992029
more fixes for dataloader integration
Wing Lian
2023-07-18 10:50:40 -04:00
762f1b08db
add position_ids back
Wing Lian
2023-07-18 01:50:41 -04:00
3aba4c5d7c
use multi pack dataloader w random sampler
Wing Lian
2023-07-17 23:44:14 -04:00
ffd96839cf
don't move masks to cpu
Wing Lian
2023-07-17 11:08:43 -04:00
ef9bf7ad73
fix expand mask for multiple batch items, make sure we pad position_ids
Wing Lian
2023-07-17 06:17:28 -04:00
4964b0d345
set position ids and use block diagonal attn mask
Wing Lian
2023-07-17 01:56:32 -04:00
36b0e30a9d
fix attetion mask with packing
Wing Lian
2023-07-15 10:38:01 -04:00
9f99104038
update comment for group_by_length
Aman Karmani
2023-08-07 01:04:56 -07:00
36fefcf94b
set group_by_length to false in examples
Aman Karmani
2023-08-06 23:59:09 -07:00
176b888a63
ensure enable_input_require_grads is called on model before getting the peft model (#345 )
Wing Lian
2023-08-06 18:13:10 -04:00
3392270544
experimental llama 2 chat support (#296 )
Jan Philipp Harries
2023-08-06 23:40:52 +02:00
bb53a165f5
add a basic ds zero3 config (#347 )
Wing Lian
2023-08-06 17:19:51 -04:00
10405b9995
Update XFormers Attention Monkeypatch to handle Llama-2 70B (GQA) (#339 )
ssmi153
2023-08-07 03:09:04 +12:00
9793faf6dc
pre-commit formatting fixes
ssmi-main
Wing Lian
2023-08-05 22:46:02 -04:00
c93655c0a3
Added Orca Mini prompt strategy (#263 )
Jan Philipp Harries
2023-08-05 20:16:41 +02:00
64852ae15a
Whitespace bug fix
ssmi153
2023-08-05 15:08:44 +12:00
1fed74b1d9
Catch configs without pretraining_tp
ssmi153
2023-08-05 11:45:12 +12:00
a300a4db1d
Fix XFormers attention for Llama-2 70B (GQA)
ssmi153
2023-08-05 11:01:44 +12:00
fe285430bc
optimize the iteration when tokenizeing large datasets (#332 )
Wing Lian
2023-08-04 12:12:05 -04:00
0d2e34f056
Merge pull request #336 from tmm1/flash-attn
Aman Gupta Karmani
2023-08-03 16:25:30 -07:00
b56a6c0101
Merge pull request #337 from tmm1/readme-fix
Aman Gupta Karmani
2023-08-03 15:14:17 -07:00
2eda9e02a9
fix typo
Aman Karmani
2023-08-03 21:04:12 +00:00
78b9efb7f4
scope flash-attn+qlora fix correctly, scope to llama, add comment
Aman Karmani
2023-08-03 19:19:39 +00:00
312a9fad07
move flash-attn monkey patch alongside the others
Aman Karmani
2023-08-03 17:20:49 +00:00
58d665943e
python 3.10 and 3.11 both work fine, as does pytorch 2.1.0.dev
Aman Karmani
2023-08-03 16:47:25 +00:00
cc7e80026e
there is no configs folder
Aman Karmani
2023-08-03 16:31:37 +00:00
dc71d8872a
feat/llama-2 examples (#319 )
mhenrichsen
2023-08-03 12:22:48 +02:00
248bf90f89
ensure flash-attn fixes happen in both adapter/lora modes, and use torch_dtype
Aman Karmani
2023-08-02 20:15:03 +00:00
77085ea24e
qlora w flash attention fixes (#333 )
Wing Lian
2023-08-01 23:26:16 -04:00
db2a3586f3
add peft install back since it doesn't get installed by setup.py (#331 )
Wing Lian
2023-07-31 16:31:53 -04:00
6c9a87c8ee
pin accelerate so it works with llama2 (#330 )
Wing Lian
2023-07-30 22:20:06 -04:00
894cba09f3
fix FSDP save of final model (#329 )
Wing Lian
2023-07-30 21:46:44 -04:00
41a4d15d43
update README for updated docker images (#328 )
Wing Lian
2023-07-28 16:50:03 -04:00
2c37bf6c21
Prune cuda117 (#327 )
Wing Lian
2023-07-26 16:27:49 -04:00
9f69c4d8c1
latest HEAD of accelerate causes 0 loss immediately w FSDP (#321 )
Wing Lian
2023-07-24 11:23:56 -04:00
3d4984b9a5
update prompts for open orca to match the paper (#317 )
Wing Lian
2023-07-22 13:49:11 -04:00
ff7f18d1ed
disable gh cache for first step of docker builds too
Wing Lian
2023-07-22 11:46:37 -04:00
cf62cfd661
add runpod envs to .bashrc, fix bnb env (#316 )
Wing Lian
2023-07-22 10:09:38 -04:00
c5df969262
don't use the gha cache w docker
Wing Lian
2023-07-22 08:46:21 -04:00
40a53ff181
Merge pull request #307 from OpenAccess-AI-Collective/xgen-user-sharegpt-tokens
Wing Lian
2023-07-22 04:10:38 -04:00
dcdec44347
Merge pull request #306 from ethanhs/xgen
Wing Lian
2023-07-22 04:10:18 -04:00
3ffb018a4c
Merge pull request #313 from OpenAccess-AI-Collective/tokenizer-llama2-embeddings
Wing Lian
2023-07-22 04:09:59 -04:00
a94f2eecb1
Merge pull request #299 from OpenAccess-AI-Collective/flash-attention-2
Wing Lian
2023-07-22 04:07:48 -04:00
1066751358
don't resize embeddings to multiples of 32x by default
Wing Lian
2023-07-22 01:52:38 -04:00
1b63bf13bc
Merge pull request #308 from OpenAccess-AI-Collective/apache2-license
Wing Lian
2023-07-21 09:50:14 -04:00
5cce2a42ff
add apache 2.0 license
Wing Lian
2023-07-21 09:49:29 -04:00
2a428e8014
better handling since xgen tokenizer breaks with convert_tokens_to_ids
Wing Lian
2023-07-21 09:24:11 -04:00
cdf85fdbd5
pin flash attention 2 to the fix for backwards pass
Wing Lian
2023-07-21 08:18:53 -04:00
9b790d359b
flash attention 2
Wing Lian
2023-07-20 00:00:49 -04:00
38811434e6
Add XGen info to README and example config
Ethan Smith
2023-07-21 00:44:50 -07:00
06c61d6f13
Merge pull request #304 from OpenAccess-AI-Collective/NanoCode012-patch-1
NanoCode012
2023-07-21 13:39:45 +09:00
262dc29df2
Merge pull request #300 from OpenAccess-AI-Collective/pytorch-201
Wing Lian
2023-07-21 00:28:38 -04:00
165907fddb
Fix(readme): Improve wording for push model
NanoCode012
2023-07-21 11:28:35 +09:00
a032c9f452
fix sdp attention to use the flash/mem-efficient context manaager
Wing Lian
2023-07-20 01:05:48 -04:00
b06d3e3645
explicitly pin flash attention 1 to v1.0.9
Wing Lian
2023-07-20 01:02:08 -04:00
c58034d48c
use pytorch 2.0.1
Wing Lian
2023-07-20 00:47:13 -04:00
28fd429bcf
Merge pull request #293 from NanoCode012/fix/tokenize-speed
NanoCode012
2023-07-19 11:02:04 +09:00
45ac7c4f88
feat: use multi-core
NanoCode012
2023-07-19 10:16:54 +09:00
edd6980dd9
Merge pull request #289 from OpenAccess-AI-Collective/hf_transfer
Wing Lian
2023-07-17 15:08:06 -04:00
dc6d25124d
Merge pull request #288 from OpenAccess-AI-Collective/NanoCode012-patch-1
Wing Lian
2023-07-17 14:46:43 -04:00
6dd2e7d671
add hf_transfer to requirements for faster hf upload
Wing Lian
2023-07-17 14:44:48 -04:00
b64f411849
fix(readme): remove accelerate config
NanoCode012
2023-07-18 01:31:02 +09:00
03a59c1ed4
Merge pull request #287 from OpenAccess-AI-Collective/dataclass-fix
Wing Lian
2023-07-17 06:09:23 -04:00
ebaec3c406
fix axolotl training args dataclass annotation
Wing Lian
2023-07-17 04:57:02 -04:00
73e70e3996
Merge pull request #286 from OpenAccess-AI-Collective/logging-docker-fixes
Wing Lian
2023-07-17 04:26:39 -04:00
d75adb9835
misc fixes
Wing Lian
2023-07-17 03:00:27 -04:00
02224668c3
Merge pull request #283 from OpenAccess-AI-Collective/docker-git-fetch
Wing Lian
2023-07-17 02:17:00 -04:00
f162f3c7cc
set transformers cache env var in docker image
Wing Lian
2023-07-16 23:03:54 -04:00
eca3531329
git fetch fix for docker
Wing Lian
2023-07-16 22:25:05 -04:00
6f16c4569d
Merge pull request #276 from theobjectivedad/logging_enhancement
Wing Lian
2023-07-16 17:04:52 -04:00
0bd09c077d
Merge pull request #280 from teknium1/main
Wing Lian
2023-07-16 16:08:58 -04:00
469c08c9ba
Merge pull request #279 from NanoCode012/feat/multi-gpu-readme
Wing Lian
2023-07-16 16:08:37 -04:00
334af625d0
Merge pull request #277 from cg123/dataset-name
Wing Lian
2023-07-16 16:08:15 -04:00
273b3a3aa7
Update requirements.txt
Teknium
2023-07-16 10:24:24 -07:00
3cdd8e4122
Add dataset name to all yaml options in README
Charles Goddard
2023-07-15 13:17:37 -07:00
cf5ae6b649
Feat(readme): improve docs on multi-gpu
NanoCode012
2023-07-16 01:07:27 +09:00
8028652b8f
fix attetion mask with packing
openorca-fix-mask
Wing Lian
2023-07-15 10:38:01 -04:00
b1f4f7a34d
Fixed pre-commit problems, fixed small bug in logging_config to handle LOG_LEVEL env var
theobjectivedad
2023-07-15 12:29:35 +00:00
81d60e96f0
multipack sampler support from openchat
multipack
Wing Lian
2023-07-15 08:01:33 -04:00
83237b8445
Merge branch 'OpenAccess-AI-Collective:main' into logging_enhancement
The Objective Dad
2023-07-15 06:16:04 -05:00
46032a1a1f
Fix formatting mistake
Charles Goddard
2023-07-14 20:57:27 -07:00
8bba64258e
Add example of dataset with configuration name to README
Charles Goddard
2023-07-14 20:46:21 -07:00
88089e8b32
Add ability to pass 'name' argument to load_dataset
Charles Goddard
2023-07-14 16:46:39 -07:00
168a7a09cc
Merge pull request #274 from OpenAccess-AI-Collective/NanoCode012-patch-2
NanoCode012
2023-07-14 23:15:47 +09:00
231031a0e1
Merge pull request #275 from NanoCode012/feat/safetensors
NanoCode012
2023-07-14 23:07:26 +09:00
9234b75cb4
Update log message format, IMO this is easier to read.
theobjectivedad
2023-07-14 07:36:21 -05:00
553a86b52c
Adding logging enhancement
theobjectivedad
2023-07-14 07:26:19 -05:00