Wing Lian
d3c2b7ce9d
kd sample packing
2025-01-14 22:47:45 -05:00
Wing Lian
93dfff92f1
be a bit pickier about loading dynamic prompt strategies
2025-01-14 22:47:45 -05:00
Wing Lian
6e409d2d88
more info on preprocess for kd and fix import
2025-01-14 22:47:45 -05:00
Wing Lian
d5bc214300
remove duplicate code
2025-01-14 22:47:45 -05:00
Wing Lian
92c6c1087e
add copyrights
2025-01-14 22:47:45 -05:00
Wing Lian
feed96f95e
increase logging around loading plugins
2025-01-14 22:47:44 -05:00
Wing Lian
cba6165ae1
make plugin setup concise
2025-01-14 22:47:44 -05:00
Wing Lian
cdfcd69afa
remove moved class from import
2025-01-14 22:47:44 -05:00
Wing Lian
885653d52e
move more things to kd plugin
2025-01-14 22:47:44 -05:00
Wing Lian
27faacbf5a
refactor kd chat template loader
2025-01-14 22:47:44 -05:00
Wing Lian
c51b0337c1
support for custom trainer classes from plugins
2025-01-14 22:47:44 -05:00
Wing Lian
fa055f9f69
handle token/logprob shifting
2025-01-14 22:47:43 -05:00
Wing Lian
f60c623af0
remove references to triton kd for now
2025-01-14 22:47:43 -05:00
Wing Lian
746891eb5c
add license block
2025-01-14 22:47:43 -05:00
Wing Lian
f09b5da60b
refactor so we can easily add new loss functions
2025-01-14 22:47:43 -05:00
Wing Lian
689e1c10ba
chore: lint
2025-01-14 22:47:43 -05:00
Wing Lian
a5c085e003
var naming and add todo
2025-01-14 22:47:43 -05:00
Wing Lian
63146300b7
fix kd loss so it's causal (fixes repeating tokens)
2025-01-14 22:47:43 -05:00
Wing Lian
ca5e397fc5
use kd_alpha in the correct loss method
2025-01-14 22:47:42 -05:00
Wing Lian
3416302b0d
hash for temperature too
2025-01-14 22:47:42 -05:00
Wing Lian
7366efc4ca
better rescaling for temperatures
2025-01-14 22:47:42 -05:00
Wing Lian
d8d817eaed
don't use triton for now
2025-01-14 22:47:42 -05:00
Wing Lian
c0757e8a20
fix kwarg
2025-01-14 22:47:42 -05:00
Wing Lian
e565694914
v3
2025-01-14 22:47:42 -05:00
Wing Lian
081928e55b
no torch.tensor
2025-01-14 22:47:42 -05:00
Wing Lian
dc90c93894
no log etc
2025-01-14 22:47:41 -05:00
Wing Lian
18a46c338a
no torch.exp inside triton kernel
2025-01-14 22:47:41 -05:00
Wing Lian
119d586cf4
v2 trial
2025-01-14 22:47:41 -05:00
Wing Lian
c73acd7de0
no where support
2025-01-14 22:47:41 -05:00
Wing Lian
0b59a242d4
triton wip
2025-01-14 22:47:41 -05:00
Wing Lian
ed490517da
chore: lint
2025-01-14 22:47:41 -05:00
Wing Lian
00ce77e7ef
make sure to multiply against the correct loss
2025-01-14 22:47:41 -05:00
Wing Lian
ae545e0165
cross entropy loss coefficient during KD
2025-01-14 22:47:40 -05:00
Wing Lian
b592c05b93
flipped the slice
2025-01-14 22:47:40 -05:00
Wing Lian
7fe0ad088b
make it work
2025-01-14 22:47:40 -05:00
Wing Lian
ddcf5c68b3
handle padding/collation for KD datasets
2025-01-14 22:47:40 -05:00
Wing Lian
e633a12dbe
make batch smaller
2025-01-14 22:47:40 -05:00
Wing Lian
d584354ee4
filter bad rows
2025-01-14 22:47:40 -05:00
Wing Lian
303cfa71aa
KD dataset loading and KD with logprobs
2025-01-14 22:47:40 -05:00
Wing Lian
88b3198894
refactor trainer to prevent circular dependencies later
...
fix loader default
2025-01-14 22:47:39 -05:00
jwongTensora
8606093921
fix for indexing error from token/embeddings mismatch ( #2257 )
...
Co-authored-by: jwong <jwongTensora@gmail.com >
2025-01-14 22:09:29 -05:00
NanoCode012
cba5a457d9
fix: use text_column even when not packing for pretraining ( #2254 )
...
* fix: use text_column even when not packing for pretraining
* feat: update test to check when not packing
* chore: lint
* Update src/axolotl/utils/data/pretraining.py
Co-authored-by: Wing Lian <wing.lian@gmail.com >
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
Co-authored-by: Wing Lian <wing.lian@gmail.com >
2025-01-14 22:08:56 -05:00
Wing Lian
19cd83d408
rename references to dpo dataset prep to pref data ( #2258 )
2025-01-14 22:07:55 -05:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3
assume empty lora dropout means 0.0 and add tests ( #2243 )
...
* assume empty lora dropout means 0.0 and add tests
* remove un-necessary arg
* refactor based on pr feedback:
* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
d8b4027200
use 2.5.1 docker images as latest tag as it seems stable ( #2198 )
2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e
feat: add support for data_files in pretraining ( #2238 )
2025-01-09 21:04:13 +00:00