Wing Lian
287d2ca8d5
kd sample packing
2025-01-13 13:41:36 -05:00
Wing Lian
03b86df506
be a bit pickier about loading dynamic prompt strategies
2025-01-13 13:41:36 -05:00
Wing Lian
2ed4246949
more info on preprocess for kd and fix import
2025-01-13 13:41:35 -05:00
Wing Lian
35bc2e2d3f
remove duplicate code
2025-01-13 13:41:35 -05:00
Wing Lian
94f1094805
add copyrights
2025-01-13 13:41:35 -05:00
Wing Lian
a0070bf94e
increase logging around loading plugins
2025-01-13 13:41:35 -05:00
Wing Lian
2ee2ffd834
make plugin setup concise
2025-01-13 13:41:35 -05:00
Wing Lian
723b0a2dee
remove moved class from import
2025-01-13 13:41:35 -05:00
Wing Lian
327739c9e3
move more things to kd plugin
2025-01-13 13:41:35 -05:00
Wing Lian
8aafe142f2
refactor kd chat template loader
2025-01-13 13:41:35 -05:00
Wing Lian
a0d6d8895e
support for custom trainer classes from plugins
2025-01-13 13:41:34 -05:00
Wing Lian
55b33cc44d
handle token/logprob shifting
2025-01-13 13:41:34 -05:00
Wing Lian
69ed25e82c
remove references to triton kd for now
2025-01-13 13:41:34 -05:00
Wing Lian
2ea8b7e518
add license block
2025-01-13 13:41:34 -05:00
Wing Lian
aa081e0e76
refactor so we can easily add new loss functions
2025-01-13 13:41:34 -05:00
Wing Lian
3f97ec45fb
chore: lint
2025-01-13 13:41:34 -05:00
Wing Lian
7b5a24b0d2
var naming and add todo
2025-01-13 13:41:34 -05:00
Wing Lian
4ddd089d0a
fix kd loss so it's causal (fixes repeating tokens)
2025-01-13 13:41:34 -05:00
Wing Lian
b88128d067
use kd_alpha in the correct loss method
2025-01-13 13:41:32 -05:00
Wing Lian
2e6422a711
hash for temperature too
2025-01-13 13:40:19 -05:00
Wing Lian
6ad809287b
better rescaling for temperatures
2025-01-13 13:40:19 -05:00
Wing Lian
e376e00386
don't use triton for now
2025-01-13 13:40:19 -05:00
Wing Lian
23d7ae6caa
fix kwarg
2025-01-13 13:40:19 -05:00
Wing Lian
19638590d5
v3
2025-01-13 13:40:18 -05:00
Wing Lian
73f5b83431
no torch.tensor
2025-01-13 13:40:18 -05:00
Wing Lian
9b1164b841
no log etc
2025-01-13 13:40:18 -05:00
Wing Lian
5a7d6f6175
no torch.exp inside triton kernel
2025-01-13 13:40:18 -05:00
Wing Lian
a803c3d3ee
v2 trial
2025-01-13 13:40:18 -05:00
Wing Lian
48ccf55752
no where support
2025-01-13 13:40:18 -05:00
Wing Lian
bc3326a808
triton wip
2025-01-13 13:40:18 -05:00
Wing Lian
cf8174db75
chore: lint
2025-01-13 13:40:18 -05:00
Wing Lian
222dc27410
make sure to multiply against the correct loss
2025-01-13 13:40:18 -05:00
Wing Lian
1107f1f603
cross entropy loss coefficient during KD
2025-01-13 13:40:18 -05:00
Wing Lian
1c603da96a
flipped the slice
2025-01-13 13:40:17 -05:00
Wing Lian
283faf3909
make it work
2025-01-13 13:40:17 -05:00
Wing Lian
472f7048e5
handle padding/collation for KD datasets
2025-01-13 13:40:17 -05:00
Wing Lian
3d1e2dcef4
make batch smaller
2025-01-13 13:40:17 -05:00
Wing Lian
9e218fbcfd
filter bad rows
2025-01-13 13:40:17 -05:00
Wing Lian
11caf52529
KD dataset loading and KD with logprobs
2025-01-13 13:40:17 -05:00
Wing Lian
17ba9dcfdb
refactor trainer to prevent circular dependencies later
...
fix loader default
2025-01-13 13:40:17 -05:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3
assume empty lora dropout means 0.0 and add tests ( #2243 )
...
* assume empty lora dropout means 0.0 and add tests
* remove un-necessary arg
* refactor based on pr feedback:
* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
d8b4027200
use 2.5.1 docker images as latest tag as it seems stable ( #2198 )
2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e
feat: add support for data_files in pretraining ( #2238 )
2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4
update upstream HF deps ( #2239 )
...
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170
Use SequentialSampler if curriculum_sampling is enabled with sample_packing ( #2235 )
2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab
update modal version for ci ( #2242 )
2025-01-09 21:01:02 +00:00