axolotl

Author	SHA1	Message	Date
Sunny Liu	3f4fd3c1eb	remove padding self attention	2025-02-01 22:47:10 -05:00
Sunny Liu	48c3c47071	vanills mask	2025-02-01 14:23:37 -05:00
Sunny Liu	3ed9c117fb	try vanilla mask	2025-02-01 14:09:13 -05:00
Sunny Liu	84960003ed	reset llama_patch_multipack.py	2025-01-30 14:40:18 -05:00
Sunny Liu	93a268e43d	--no-verify fixes silly mistake	2025-01-30 14:08:26 -05:00
Sunny Liu	065f6d477e	flex batching WIP	2025-01-30 14:04:59 -05:00
Sunny Liu	96ad741cd5	flex batching WIP	2025-01-30 12:35:25 -05:00
bursteratom	ba88bc7840	wip flex block mask creation	2025-01-29 00:25:25 -05:00
Sung Ching Liu	b31796a681	Merge branch 'main' into flx_attn_support	2025-01-28 14:20:43 -05:00
Wing Lian	887513285d	support for custom lr groups for non-embedding modules (#2213 ) * support for custom lr groups for non-embedding modules invert name check for group modules include lr_groups in training args additional conditional for creating optimizer fix regular params as w weight decay fix lookup and add docs * address pr feedback	2025-01-24 12:56:28 -05:00
Wing Lian	20620771f1	Pretrain multipack (#2278 ) * fix for pretrain with packing * fix model name and loss expected * make sure to check with micro batch size for pretraining * change loss threshholds based on parametrization * make tests smaller for CI * fix pretrain packing * fix pretrain packing test * address pr feedback	2025-01-24 12:55:20 -05:00
NanoCode012	6086162488	chore(doc): improve explanation for _steps and _strategy (#2270 )	2025-01-24 10:07:02 -05:00
mashdragon	b2774af66c	Take `split` param from config in all load_dataset instances (#2281 )	2025-01-24 10:06:50 -05:00
NanoCode012	74f9782fc3	chore(doc): fix explanation on gcs creds retrieval (#2272 )	2025-01-24 10:05:58 -05:00
Wing Lian	8a7a0b07dc	support for latest transformers release 4.48.1 (#2256 )	2025-01-23 21:17:57 -05:00
Sunny Liu	5ca57cb55a	undo bool conversion	2025-01-23 17:56:13 -05:00
Sunny Liu	0149de7fb0	mask to bool	2025-01-23 15:30:08 -05:00
Sunny Liu	8c34c65181	dummy	2025-01-23 14:56:26 -05:00
Sunny Liu	555aa5772a	skip mask conversion if already 4d	2025-01-23 14:01:53 -05:00
Sunny Liu	e8b2789086	revert mask expand	2025-01-23 11:20:38 -05:00
Sunny Liu	85752cdfc9	mask expansion	2025-01-22 21:33:38 -05:00
Sunny Liu	f2f23c8041	mask expansion	2025-01-22 21:31:42 -05:00
Sunny Liu	8b3eec7f6e	mask expansion	2025-01-22 21:29:52 -05:00
Sunny Liu	bb9bea3110	mask expansion	2025-01-22 21:27:25 -05:00
Sunny Liu	0dd18a3681	llama sdpa patching WIP - static class function import	2025-01-22 21:10:05 -05:00
Sunny Liu	152e988d3c	llama sdpa patching WIP - static class function import	2025-01-22 21:02:26 -05:00
Sunny Liu	27532825a9	llama sdpa patching WIP - static class function import	2025-01-22 21:00:34 -05:00
Sunny Liu	06f83a54a5	llama sdpa patching WIP - static class function import	2025-01-22 20:45:44 -05:00
Sunny Liu	d7b133dc1f	llama sdpa patching WIP - static class function import	2025-01-22 20:33:13 -05:00
Sunny Liu	f3bec17917	llama sdpa patching WIP - static class function import	2025-01-22 20:25:26 -05:00
Sunny Liu	b7deb5241c	llama sdpa patching WIP	2025-01-22 20:16:27 -05:00
Sunny Liu	cee310dcfa	llama sdpa patching WIP	2025-01-22 20:15:23 -05:00
Sunny Liu	d1be6e228d	llama sdpa patching WIP	2025-01-22 20:14:20 -05:00
Sunny Liu	5f9f77f384	llama patch	2025-01-22 11:29:28 -05:00
Wing Lian	8fb72cbc0b	use the extracted field_messages to parse the role fields (#2265 )	2025-01-21 15:39:30 -05:00
Adithya Kamath	bb9d4102c4	Add 5000 line history limit to tmux for docker cloud (#2268 )	2025-01-21 15:39:17 -05:00
bursteratom	b2a34380b3	sample packing doc mask creation WIP	2025-01-21 09:18:38 -05:00
Wing Lian	af727eedf7	option to not concatenate during pretraining (#2263 ) * option to not concatenate during pretraining * simplify conditional and add doc to config.qmd	2025-01-20 14:07:34 -05:00
Sunny Liu	80bfc50d1f	get seqlens from position ids for foc masking	2025-01-17 17:22:04 -05:00
Sunny Liu	a5360c172c	llama hijacking	2025-01-17 15:54:03 -05:00
Sunny Liu	013a9b73fc	fix transformers version for testing	2025-01-16 15:32:57 -05:00
Sunny	aad62428e0	not sure if this is necessary actually	2025-01-16 15:08:34 -05:00
Sunny	a6f2c5d583	flex sample packing WIP	2025-01-15 21:12:33 -05:00
jwongTensora	8606093921	fix for indexing error from token/embeddings mismatch (#2257 ) Co-authored-by: jwong <jwongTensora@gmail.com>	2025-01-14 22:09:29 -05:00
NanoCode012	cba5a457d9	fix: use text_column even when not packing for pretraining (#2254 ) * fix: use text_column even when not packing for pretraining * feat: update test to check when not packing * chore: lint * Update src/axolotl/utils/data/pretraining.py Co-authored-by: Wing Lian <wing.lian@gmail.com> --------- Co-authored-by: Wing Lian <wing@axolotl.ai> Co-authored-by: Wing Lian <wing.lian@gmail.com>	2025-01-14 22:08:56 -05:00
Wing Lian	19cd83d408	rename references to dpo dataset prep to pref data (#2258 )	2025-01-14 22:07:55 -05:00
Sunny	dbcd11e533	revert seq len in multipack sampler	2025-01-14 11:45:35 -05:00
Sunny	c06a6be915	flex_attn sample packing WIP	2025-01-14 00:22:05 -05:00
Dan Saunders	1ed4de73b6	CLI cleanup and documentation (#2244 ) * CLI init refactor * fix * cleanup and (partial) docs * Adding documentation and continuing cleanup (in progress) * remove finetune.py script * continued cleanup and documentation * pytest fixes * review comments * fix * Fix * typing fixes * make sure the batch dataset patcher for multipack is always loaded when handling datasets * review comments * fix --------- Co-authored-by: Dan Saunders <dan@axolotl.ai> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-01-13 17:55:29 +00:00
Wing Lian	f89e962119	skip over rows in pretraining dataset (#2223 ) * skip over rows in pretraining dataset * update docs	2025-01-13 10:44:45 -05:00

1 2 3 4 5 ...

1875 Commits