Sunny Liu
3f4fd3c1eb
remove padding self attention
2025-02-01 22:47:10 -05:00
Sunny Liu
48c3c47071
vanills mask
2025-02-01 14:23:37 -05:00
Sunny Liu
3ed9c117fb
try vanilla mask
2025-02-01 14:09:13 -05:00
Sunny Liu
84960003ed
reset llama_patch_multipack.py
2025-01-30 14:40:18 -05:00
Sunny Liu
93a268e43d
--no-verify
...
fixes silly mistake
2025-01-30 14:08:26 -05:00
Sunny Liu
065f6d477e
flex batching WIP
2025-01-30 14:04:59 -05:00
Sunny Liu
96ad741cd5
flex batching WIP
2025-01-30 12:35:25 -05:00
bursteratom
ba88bc7840
wip flex block mask creation
2025-01-29 00:25:25 -05:00
Sung Ching Liu
b31796a681
Merge branch 'main' into flx_attn_support
2025-01-28 14:20:43 -05:00
Wing Lian
887513285d
support for custom lr groups for non-embedding modules ( #2213 )
...
* support for custom lr groups for non-embedding modules
invert name check for group modules
include lr_groups in training args
additional conditional for creating optimizer
fix regular params as w weight decay
fix lookup and add docs
* address pr feedback
2025-01-24 12:56:28 -05:00
Wing Lian
20620771f1
Pretrain multipack ( #2278 )
...
* fix for pretrain with packing
* fix model name and loss expected
* make sure to check with micro batch size for pretraining
* change loss threshholds based on parametrization
* make tests smaller for CI
* fix pretrain packing
* fix pretrain packing test
* address pr feedback
2025-01-24 12:55:20 -05:00
NanoCode012
6086162488
chore(doc): improve explanation for *_steps and *_strategy ( #2270 )
2025-01-24 10:07:02 -05:00
mashdragon
b2774af66c
Take split param from config in all load_dataset instances ( #2281 )
2025-01-24 10:06:50 -05:00
NanoCode012
74f9782fc3
chore(doc): fix explanation on gcs creds retrieval ( #2272 )
2025-01-24 10:05:58 -05:00
Wing Lian
8a7a0b07dc
support for latest transformers release 4.48.1 ( #2256 )
2025-01-23 21:17:57 -05:00
Sunny Liu
5ca57cb55a
undo bool conversion
2025-01-23 17:56:13 -05:00
Sunny Liu
0149de7fb0
mask to bool
2025-01-23 15:30:08 -05:00
Sunny Liu
8c34c65181
dummy
2025-01-23 14:56:26 -05:00
Sunny Liu
555aa5772a
skip mask conversion if already 4d
2025-01-23 14:01:53 -05:00
Sunny Liu
e8b2789086
revert mask expand
2025-01-23 11:20:38 -05:00
Sunny Liu
85752cdfc9
mask expansion
2025-01-22 21:33:38 -05:00
Sunny Liu
f2f23c8041
mask expansion
2025-01-22 21:31:42 -05:00
Sunny Liu
8b3eec7f6e
mask expansion
2025-01-22 21:29:52 -05:00
Sunny Liu
bb9bea3110
mask expansion
2025-01-22 21:27:25 -05:00
Sunny Liu
0dd18a3681
llama sdpa patching WIP - static class function import
2025-01-22 21:10:05 -05:00
Sunny Liu
152e988d3c
llama sdpa patching WIP - static class function import
2025-01-22 21:02:26 -05:00
Sunny Liu
27532825a9
llama sdpa patching WIP - static class function import
2025-01-22 21:00:34 -05:00
Sunny Liu
06f83a54a5
llama sdpa patching WIP - static class function import
2025-01-22 20:45:44 -05:00
Sunny Liu
d7b133dc1f
llama sdpa patching WIP - static class function import
2025-01-22 20:33:13 -05:00
Sunny Liu
f3bec17917
llama sdpa patching WIP - static class function import
2025-01-22 20:25:26 -05:00
Sunny Liu
b7deb5241c
llama sdpa patching WIP
2025-01-22 20:16:27 -05:00
Sunny Liu
cee310dcfa
llama sdpa patching WIP
2025-01-22 20:15:23 -05:00
Sunny Liu
d1be6e228d
llama sdpa patching WIP
2025-01-22 20:14:20 -05:00
Sunny Liu
5f9f77f384
llama patch
2025-01-22 11:29:28 -05:00
Wing Lian
8fb72cbc0b
use the extracted field_messages to parse the role fields ( #2265 )
2025-01-21 15:39:30 -05:00
Adithya Kamath
bb9d4102c4
Add 5000 line history limit to tmux for docker cloud ( #2268 )
2025-01-21 15:39:17 -05:00
bursteratom
b2a34380b3
sample packing doc mask creation WIP
2025-01-21 09:18:38 -05:00
Wing Lian
af727eedf7
option to not concatenate during pretraining ( #2263 )
...
* option to not concatenate during pretraining
* simplify conditional and add doc to config.qmd
2025-01-20 14:07:34 -05:00
Sunny Liu
80bfc50d1f
get seqlens from position ids for foc masking
2025-01-17 17:22:04 -05:00
Sunny Liu
a5360c172c
llama hijacking
2025-01-17 15:54:03 -05:00
Sunny Liu
013a9b73fc
fix transformers version for testing
2025-01-16 15:32:57 -05:00
Sunny
aad62428e0
not sure if this is necessary actually
2025-01-16 15:08:34 -05:00
Sunny
a6f2c5d583
flex sample packing WIP
2025-01-15 21:12:33 -05:00
jwongTensora
8606093921
fix for indexing error from token/embeddings mismatch ( #2257 )
...
Co-authored-by: jwong <jwongTensora@gmail.com >
2025-01-14 22:09:29 -05:00
NanoCode012
cba5a457d9
fix: use text_column even when not packing for pretraining ( #2254 )
...
* fix: use text_column even when not packing for pretraining
* feat: update test to check when not packing
* chore: lint
* Update src/axolotl/utils/data/pretraining.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-01-14 22:08:56 -05:00
Wing Lian
19cd83d408
rename references to dpo dataset prep to pref data ( #2258 )
2025-01-14 22:07:55 -05:00
Sunny
dbcd11e533
revert seq len in multipack sampler
2025-01-14 11:45:35 -05:00
Sunny
c06a6be915
flex_attn sample packing WIP
2025-01-14 00:22:05 -05:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00