Wing Lian
d8d817eaed
don't use triton for now
2025-01-14 22:47:42 -05:00
Wing Lian
c0757e8a20
fix kwarg
2025-01-14 22:47:42 -05:00
Wing Lian
e565694914
v3
2025-01-14 22:47:42 -05:00
Wing Lian
081928e55b
no torch.tensor
2025-01-14 22:47:42 -05:00
Wing Lian
dc90c93894
no log etc
2025-01-14 22:47:41 -05:00
Wing Lian
18a46c338a
no torch.exp inside triton kernel
2025-01-14 22:47:41 -05:00
Wing Lian
119d586cf4
v2 trial
2025-01-14 22:47:41 -05:00
Wing Lian
c73acd7de0
no where support
2025-01-14 22:47:41 -05:00
Wing Lian
0b59a242d4
triton wip
2025-01-14 22:47:41 -05:00
Wing Lian
ed490517da
chore: lint
2025-01-14 22:47:41 -05:00
Wing Lian
00ce77e7ef
make sure to multiply against the correct loss
2025-01-14 22:47:41 -05:00
Wing Lian
ae545e0165
cross entropy loss coefficient during KD
2025-01-14 22:47:40 -05:00
Wing Lian
b592c05b93
flipped the slice
2025-01-14 22:47:40 -05:00
Wing Lian
7fe0ad088b
make it work
2025-01-14 22:47:40 -05:00
Wing Lian
ddcf5c68b3
handle padding/collation for KD datasets
2025-01-14 22:47:40 -05:00
Wing Lian
e633a12dbe
make batch smaller
2025-01-14 22:47:40 -05:00
Wing Lian
d584354ee4
filter bad rows
2025-01-14 22:47:40 -05:00
Wing Lian
303cfa71aa
KD dataset loading and KD with logprobs
2025-01-14 22:47:40 -05:00
Wing Lian
88b3198894
refactor trainer to prevent circular dependencies later
...
fix loader default
2025-01-14 22:47:39 -05:00
jwongTensora
8606093921
fix for indexing error from token/embeddings mismatch ( #2257 )
...
Co-authored-by: jwong <jwongTensora@gmail.com >
2025-01-14 22:09:29 -05:00
NanoCode012
cba5a457d9
fix: use text_column even when not packing for pretraining ( #2254 )
...
* fix: use text_column even when not packing for pretraining
* feat: update test to check when not packing
* chore: lint
* Update src/axolotl/utils/data/pretraining.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-01-14 22:08:56 -05:00
Wing Lian
19cd83d408
rename references to dpo dataset prep to pref data ( #2258 )
2025-01-14 22:07:55 -05:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3
assume empty lora dropout means 0.0 and add tests ( #2243 )
...
* assume empty lora dropout means 0.0 and add tests
* remove un-necessary arg
* refactor based on pr feedback:
* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
d8b4027200
use 2.5.1 docker images as latest tag as it seems stable ( #2198 )
2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e
feat: add support for data_files in pretraining ( #2238 )
2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4
update upstream HF deps ( #2239 )
...
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170
Use SequentialSampler if curriculum_sampling is enabled with sample_packing ( #2235 )
2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab
update modal version for ci ( #2242 )
2025-01-09 21:01:02 +00:00
NanoCode012
2e8d7c1adb
fix: mistral nemo does not recognize token_type_ids in forward ( #2233 )
2025-01-09 21:00:36 +00:00
Wing Lian
3c1921e400
add hf cache caching for GHA ( #2247 )
...
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
2025-01-09 20:59:54 +00:00
Wing Lian
7faf2b6e8e
Merge group queue ( #2248 )
...
* add support for merge groups
* also lint merge groups
2025-01-09 15:49:00 -05:00
salman
c1b920f291
Fixing OSX installation ( #2231 )
...
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
2025-01-07 13:42:01 +00:00
Wing Lian
3915abee4c
make sure padding is labeled as -100 for pretraining ( #2227 )
2024-12-31 15:22:18 -05:00
NJordan72
7a38dbe674
fix: allow trainer builder to use custom jinja chat template ( #2219 )
...
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
2024-12-24 16:18:50 -05:00
Wing Lian
e0a2eb2ebd
fix untrained tokens if specified explicitly from a list ( #2210 )
2024-12-23 09:08:28 -05:00
Wing Lian
d852d7af7a
inference - don't default w accelerate, fix base model ( #2216 ) [skip ci]
2024-12-23 07:48:41 -05:00
Wing Lian
3742deb1de
add deepspeed example with torch compile enabled ( #2212 ) [skip ci]
2024-12-22 12:11:39 -05:00
Wing Lian
2312caaa98
GC every n steps ( #2209 )
2024-12-21 17:38:33 -05:00
Wing Lian
307cf7c685
move the dataset loading from remote/disk to a shared function so we can re-use for RL ( #2204 )
2024-12-20 21:43:52 -05:00
Dan Saunders
70541145f1
adding test_datasets compat with pretraining_dataset (streaming) ( #2206 ) [skip ci]
2024-12-20 21:43:33 -05:00
Wing Lian
42bd32a233
add outputs (symlink) to gitignore [skip ci] ( #2205 )
2024-12-19 20:14:43 -05:00
Dan Saunders
5b8fb5e939
remove cicd pytest xdist args ( #2201 )
...
* remove cicd pytest xdist args
* Delete outputs
2024-12-19 11:44:53 -05:00
Wing Lian
bd2a594b89
use DataCollatorWithFlattening when not sample packing ( #2167 )
2024-12-17 17:46:44 -05:00
Wing Lian
3798229d85
handle torch_compile set to auto ( #2172 ) [skip ci]
...
* handle torch_compile set to auto
* update docs [skip ci]
* add tests
2024-12-17 16:42:41 -05:00
NanoCode012
10cfecf02e
fix: use apply_chat_template to find turn boundaries and allow tool_calling field ( #2179 ) [skip ci]
...
* fix: use apply_chat_template to find turn boundaries and allow tool_calling field
* fix: keys to include in turn
* feat(doc): explicitly recommend setting train_on_eos and roles_to_train
* fix: eos not being masked for tool due to template padding
* chore: clear up docs
* fix: default messages format, train_on_eos: turn, and train on all assistant msg
* fix: properly warn if empty content
* feat: parametrize chat_template tests to test different tokenizers
* fix: set proper default for message key
* fix: update defaults to match load function
* fix: change defaults to use new
* feat: add tool_calling dataset
* feat: add tool_calling test
* fix: add handling of edge case of mistral tokenizer with only system prompt
* feat: refactor all test to follow source code
* fix: remove unnecessary eos_token from phi35
* fix test for phi3.5 since eos was dropped from chat_template
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2024-12-17 16:42:21 -05:00
Wing Lian
339f3c67e2
dataset tags don't support https uris ( #2195 )
2024-12-17 13:58:53 -05:00