Commit Graph

1889 Commits

Author SHA1 Message Date
Sunny Liu
8e1adc154d stuff 2025-02-02 20:36:14 -05:00
Sunny Liu
e5b36900e4 misc 2025-02-02 20:32:03 -05:00
Sunny Liu
9f6c89b12b undo my stupidity 2025-02-02 20:25:53 -05:00
Sunny Liu
b0871c8d3b attempt - mask padding 2025-02-02 20:18:49 -05:00
bursteratom
d3ea379a23 figure out slight diff from flash result 2025-02-02 01:45:54 -05:00
bursteratom
0ebab63309 test 2025-02-02 01:27:15 -05:00
bursteratom
e98581f6f5 BLOCK SIZE 2025-02-02 01:22:23 -05:00
bursteratom
b832b11c8f stuff 2025-02-02 00:51:43 -05:00
bursteratom
b692d394b1 more test 2025-02-02 00:48:57 -05:00
bursteratom
2319e5276d more test 2025-02-02 00:48:15 -05:00
bursteratom
9a43a0925d more test 2025-02-02 00:45:30 -05:00
bursteratom
10de67e8ea more test 2025-02-02 00:43:41 -05:00
bursteratom
fa7355404c test 2025-02-02 00:38:35 -05:00
bursteratom
907424a2e8 stuff 2025-02-02 00:29:09 -05:00
Sunny Liu
3f4fd3c1eb remove padding self attention 2025-02-01 22:47:10 -05:00
Sunny Liu
48c3c47071 vanills mask 2025-02-01 14:23:37 -05:00
Sunny Liu
3ed9c117fb try vanilla mask 2025-02-01 14:09:13 -05:00
Sunny Liu
84960003ed reset llama_patch_multipack.py 2025-01-30 14:40:18 -05:00
Sunny Liu
93a268e43d --no-verify
fixes silly mistake
2025-01-30 14:08:26 -05:00
Sunny Liu
065f6d477e flex batching WIP 2025-01-30 14:04:59 -05:00
Sunny Liu
96ad741cd5 flex batching WIP 2025-01-30 12:35:25 -05:00
bursteratom
ba88bc7840 wip flex block mask creation 2025-01-29 00:25:25 -05:00
Sung Ching Liu
b31796a681 Merge branch 'main' into flx_attn_support 2025-01-28 14:20:43 -05:00
Wing Lian
887513285d support for custom lr groups for non-embedding modules (#2213)
* support for custom lr groups for non-embedding modules

invert name check for group modules
include lr_groups in training args
additional conditional for creating optimizer
fix regular params as w weight decay
fix lookup and add docs

* address pr feedback
2025-01-24 12:56:28 -05:00
Wing Lian
20620771f1 Pretrain multipack (#2278)
* fix for pretrain with packing

* fix model name and loss expected

* make sure to check with micro batch size for pretraining

* change loss threshholds based on parametrization

* make tests smaller for CI

* fix pretrain packing

* fix pretrain packing test

* address pr feedback
2025-01-24 12:55:20 -05:00
NanoCode012
6086162488 chore(doc): improve explanation for *_steps and *_strategy (#2270) 2025-01-24 10:07:02 -05:00
mashdragon
b2774af66c Take split param from config in all load_dataset instances (#2281) 2025-01-24 10:06:50 -05:00
NanoCode012
74f9782fc3 chore(doc): fix explanation on gcs creds retrieval (#2272) 2025-01-24 10:05:58 -05:00
Wing Lian
8a7a0b07dc support for latest transformers release 4.48.1 (#2256) 2025-01-23 21:17:57 -05:00
Sunny Liu
5ca57cb55a undo bool conversion 2025-01-23 17:56:13 -05:00
Sunny Liu
0149de7fb0 mask to bool 2025-01-23 15:30:08 -05:00
Sunny Liu
8c34c65181 dummy 2025-01-23 14:56:26 -05:00
Sunny Liu
555aa5772a skip mask conversion if already 4d 2025-01-23 14:01:53 -05:00
Sunny Liu
e8b2789086 revert mask expand 2025-01-23 11:20:38 -05:00
Sunny Liu
85752cdfc9 mask expansion 2025-01-22 21:33:38 -05:00
Sunny Liu
f2f23c8041 mask expansion 2025-01-22 21:31:42 -05:00
Sunny Liu
8b3eec7f6e mask expansion 2025-01-22 21:29:52 -05:00
Sunny Liu
bb9bea3110 mask expansion 2025-01-22 21:27:25 -05:00
Sunny Liu
0dd18a3681 llama sdpa patching WIP - static class function import 2025-01-22 21:10:05 -05:00
Sunny Liu
152e988d3c llama sdpa patching WIP - static class function import 2025-01-22 21:02:26 -05:00
Sunny Liu
27532825a9 llama sdpa patching WIP - static class function import 2025-01-22 21:00:34 -05:00
Sunny Liu
06f83a54a5 llama sdpa patching WIP - static class function import 2025-01-22 20:45:44 -05:00
Sunny Liu
d7b133dc1f llama sdpa patching WIP - static class function import 2025-01-22 20:33:13 -05:00
Sunny Liu
f3bec17917 llama sdpa patching WIP - static class function import 2025-01-22 20:25:26 -05:00
Sunny Liu
b7deb5241c llama sdpa patching WIP 2025-01-22 20:16:27 -05:00
Sunny Liu
cee310dcfa llama sdpa patching WIP 2025-01-22 20:15:23 -05:00
Sunny Liu
d1be6e228d llama sdpa patching WIP 2025-01-22 20:14:20 -05:00
Sunny Liu
5f9f77f384 llama patch 2025-01-22 11:29:28 -05:00
Wing Lian
8fb72cbc0b use the extracted field_messages to parse the role fields (#2265) 2025-01-21 15:39:30 -05:00
Adithya Kamath
bb9d4102c4 Add 5000 line history limit to tmux for docker cloud (#2268) 2025-01-21 15:39:17 -05:00