Commit Graph

1950 Commits

Author SHA1 Message Date
Sunny Liu
152e988d3c llama sdpa patching WIP - static class function import 2025-01-22 21:02:26 -05:00
Sunny Liu
27532825a9 llama sdpa patching WIP - static class function import 2025-01-22 21:00:34 -05:00
Sunny Liu
06f83a54a5 llama sdpa patching WIP - static class function import 2025-01-22 20:45:44 -05:00
Sunny Liu
d7b133dc1f llama sdpa patching WIP - static class function import 2025-01-22 20:33:13 -05:00
Sunny Liu
f3bec17917 llama sdpa patching WIP - static class function import 2025-01-22 20:25:26 -05:00
Sunny Liu
b7deb5241c llama sdpa patching WIP 2025-01-22 20:16:27 -05:00
Sunny Liu
cee310dcfa llama sdpa patching WIP 2025-01-22 20:15:23 -05:00
Sunny Liu
d1be6e228d llama sdpa patching WIP 2025-01-22 20:14:20 -05:00
Sunny Liu
5f9f77f384 llama patch 2025-01-22 11:29:28 -05:00
Wing Lian
8fb72cbc0b use the extracted field_messages to parse the role fields (#2265) 2025-01-21 15:39:30 -05:00
Adithya Kamath
bb9d4102c4 Add 5000 line history limit to tmux for docker cloud (#2268) 2025-01-21 15:39:17 -05:00
bursteratom
b2a34380b3 sample packing doc mask creation WIP 2025-01-21 09:18:38 -05:00
Wing Lian
af727eedf7 option to not concatenate during pretraining (#2263)
* option to not concatenate during pretraining

* simplify conditional and add doc to config.qmd
2025-01-20 14:07:34 -05:00
Sunny Liu
80bfc50d1f get seqlens from position ids for foc masking 2025-01-17 17:22:04 -05:00
Sunny Liu
a5360c172c llama hijacking 2025-01-17 15:54:03 -05:00
Sunny Liu
013a9b73fc fix transformers version for testing 2025-01-16 15:32:57 -05:00
Sunny
aad62428e0 not sure if this is necessary actually 2025-01-16 15:08:34 -05:00
Sunny
a6f2c5d583 flex sample packing WIP 2025-01-15 21:12:33 -05:00
jwongTensora
8606093921 fix for indexing error from token/embeddings mismatch (#2257)
Co-authored-by: jwong <jwongTensora@gmail.com>
2025-01-14 22:09:29 -05:00
NanoCode012
cba5a457d9 fix: use text_column even when not packing for pretraining (#2254)
* fix: use text_column even when not packing for pretraining

* feat: update test to check when not packing

* chore: lint

* Update src/axolotl/utils/data/pretraining.py

Co-authored-by: Wing Lian <wing.lian@gmail.com>

---------

Co-authored-by: Wing Lian <wing@axolotl.ai>
Co-authored-by: Wing Lian <wing.lian@gmail.com>
2025-01-14 22:08:56 -05:00
Wing Lian
19cd83d408 rename references to dpo dataset prep to pref data (#2258) 2025-01-14 22:07:55 -05:00
Sunny
dbcd11e533 revert seq len in multipack sampler 2025-01-14 11:45:35 -05:00
Sunny
c06a6be915 flex_attn sample packing WIP 2025-01-14 00:22:05 -05:00
Dan Saunders
1ed4de73b6 CLI cleanup and documentation (#2244)
* CLI init refactor

* fix

* cleanup and (partial) docs

* Adding documentation and continuing cleanup (in progress)

* remove finetune.py script

* continued cleanup and documentation

* pytest fixes

* review comments

* fix

* Fix

* typing fixes

* make sure the batch dataset patcher for multipack is always loaded when handling datasets

* review comments

* fix

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119 skip over rows in pretraining dataset (#2223)
* skip over rows in pretraining dataset

* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3 assume empty lora dropout means 0.0 and add tests (#2243)
* assume empty lora dropout means 0.0 and add tests

* remove un-necessary arg

* refactor based on pr feedback:

* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f add helper to verify the correct model output file exists (#2245)
* add helper to verify the correct model output file exists

* more checks using helper

* chore: lint

* fix import and relora model check

* workaround for trl trainer saves

* remove stray print
2025-01-13 10:43:29 -05:00
bursteratom
d3a0cb5edb transformers version 2025-01-13 10:33:00 -05:00
bursteratom
8b47e456b0 revert to transformers 4.47.1 2025-01-13 10:29:27 -05:00
Sunny Liu
2319ac729c Merge branch 'main' into flx_attn_support 2025-01-13 09:42:58 -05:00
Sunny
f99cae0e7b llama test 2025-01-12 17:30:19 -05:00
Wing Lian
888cd9407f use 2.5.1 docker images as latest tag as it seems stable (#2198) 2025-01-12 13:34:17 -05:00
Wing Lian
bd62d6e10a rename liger test so it properly runs in ci (#2246) 2025-01-12 13:34:17 -05:00
NanoCode012
5eae134110 feat: add support for data_files in pretraining (#2238) 2025-01-12 13:34:17 -05:00
Wing Lian
b7d27bdfa4 update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:

* bump datasets, tokenizer, trl

* remove log workarounds in trl

* bump lm-eval

* remove unsloth_ import from critical path

* remove llama fa2 from conftest

* unsloth breaks with latest upstream
2025-01-12 13:34:17 -05:00
Vincenzo di Cicco
da97a21bdc Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) 2025-01-12 13:34:17 -05:00
Wing Lian
e0d4b88598 update modal version for ci (#2242) 2025-01-12 13:34:17 -05:00
NanoCode012
fac059a209 fix: mistral nemo does not recognize token_type_ids in forward (#2233) 2025-01-12 13:34:17 -05:00
Wing Lian
9c9ac1cf0b add hf cache caching for GHA (#2247)
* add hf cache caching for GHA

* use modal volume to cache hf data

* make sure to update the cache as we add new fixtures in conftest
2025-01-12 13:34:17 -05:00
Wing Lian
2346f21b2b Merge group queue (#2248)
* add support for merge groups

* also lint merge groups
2025-01-12 13:34:17 -05:00
salman
0b47281f51 Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps

* updating pylintrc

* fixing linters

* reverting changes
2025-01-12 13:34:17 -05:00
Wing Lian
d8b4027200 use 2.5.1 docker images as latest tag as it seems stable (#2198) 2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c rename liger test so it properly runs in ci (#2246) 2025-01-09 17:31:43 -05:00
Sunny
543daaf46f llama test 2025-01-09 16:08:24 -05:00
NanoCode012
ed77e7001e feat: add support for data_files in pretraining (#2238) 2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4 update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:

* bump datasets, tokenizer, trl

* remove log workarounds in trl

* bump lm-eval

* remove unsloth_ import from critical path

* remove llama fa2 from conftest

* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170 Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) 2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab update modal version for ci (#2242) 2025-01-09 21:01:02 +00:00
NanoCode012
2e8d7c1adb fix: mistral nemo does not recognize token_type_ids in forward (#2233) 2025-01-09 21:00:36 +00:00
Wing Lian
3c1921e400 add hf cache caching for GHA (#2247)
* add hf cache caching for GHA

* use modal volume to cache hf data

* make sure to update the cache as we add new fixtures in conftest
2025-01-09 20:59:54 +00:00