Commit Graph

1850 Commits

Author SHA1 Message Date
Wing Lian
287d2ca8d5 kd sample packing 2025-01-13 13:41:36 -05:00
Wing Lian
03b86df506 be a bit pickier about loading dynamic prompt strategies 2025-01-13 13:41:36 -05:00
Wing Lian
2ed4246949 more info on preprocess for kd and fix import 2025-01-13 13:41:35 -05:00
Wing Lian
35bc2e2d3f remove duplicate code 2025-01-13 13:41:35 -05:00
Wing Lian
94f1094805 add copyrights 2025-01-13 13:41:35 -05:00
Wing Lian
a0070bf94e increase logging around loading plugins 2025-01-13 13:41:35 -05:00
Wing Lian
2ee2ffd834 make plugin setup concise 2025-01-13 13:41:35 -05:00
Wing Lian
723b0a2dee remove moved class from import 2025-01-13 13:41:35 -05:00
Wing Lian
327739c9e3 move more things to kd plugin 2025-01-13 13:41:35 -05:00
Wing Lian
8aafe142f2 refactor kd chat template loader 2025-01-13 13:41:35 -05:00
Wing Lian
a0d6d8895e support for custom trainer classes from plugins 2025-01-13 13:41:34 -05:00
Wing Lian
55b33cc44d handle token/logprob shifting 2025-01-13 13:41:34 -05:00
Wing Lian
69ed25e82c remove references to triton kd for now 2025-01-13 13:41:34 -05:00
Wing Lian
2ea8b7e518 add license block 2025-01-13 13:41:34 -05:00
Wing Lian
aa081e0e76 refactor so we can easily add new loss functions 2025-01-13 13:41:34 -05:00
Wing Lian
3f97ec45fb chore: lint 2025-01-13 13:41:34 -05:00
Wing Lian
7b5a24b0d2 var naming and add todo 2025-01-13 13:41:34 -05:00
Wing Lian
4ddd089d0a fix kd loss so it's causal (fixes repeating tokens) 2025-01-13 13:41:34 -05:00
Wing Lian
b88128d067 use kd_alpha in the correct loss method 2025-01-13 13:41:32 -05:00
Wing Lian
2e6422a711 hash for temperature too 2025-01-13 13:40:19 -05:00
Wing Lian
6ad809287b better rescaling for temperatures 2025-01-13 13:40:19 -05:00
Wing Lian
e376e00386 don't use triton for now 2025-01-13 13:40:19 -05:00
Wing Lian
23d7ae6caa fix kwarg 2025-01-13 13:40:19 -05:00
Wing Lian
19638590d5 v3 2025-01-13 13:40:18 -05:00
Wing Lian
73f5b83431 no torch.tensor 2025-01-13 13:40:18 -05:00
Wing Lian
9b1164b841 no log etc 2025-01-13 13:40:18 -05:00
Wing Lian
5a7d6f6175 no torch.exp inside triton kernel 2025-01-13 13:40:18 -05:00
Wing Lian
a803c3d3ee v2 trial 2025-01-13 13:40:18 -05:00
Wing Lian
48ccf55752 no where support 2025-01-13 13:40:18 -05:00
Wing Lian
bc3326a808 triton wip 2025-01-13 13:40:18 -05:00
Wing Lian
cf8174db75 chore: lint 2025-01-13 13:40:18 -05:00
Wing Lian
222dc27410 make sure to multiply against the correct loss 2025-01-13 13:40:18 -05:00
Wing Lian
1107f1f603 cross entropy loss coefficient during KD 2025-01-13 13:40:18 -05:00
Wing Lian
1c603da96a flipped the slice 2025-01-13 13:40:17 -05:00
Wing Lian
283faf3909 make it work 2025-01-13 13:40:17 -05:00
Wing Lian
472f7048e5 handle padding/collation for KD datasets 2025-01-13 13:40:17 -05:00
Wing Lian
3d1e2dcef4 make batch smaller 2025-01-13 13:40:17 -05:00
Wing Lian
9e218fbcfd filter bad rows 2025-01-13 13:40:17 -05:00
Wing Lian
11caf52529 KD dataset loading and KD with logprobs 2025-01-13 13:40:17 -05:00
Wing Lian
17ba9dcfdb refactor trainer to prevent circular dependencies later
fix loader default
2025-01-13 13:40:17 -05:00
Dan Saunders
1ed4de73b6 CLI cleanup and documentation (#2244)
* CLI init refactor

* fix

* cleanup and (partial) docs

* Adding documentation and continuing cleanup (in progress)

* remove finetune.py script

* continued cleanup and documentation

* pytest fixes

* review comments

* fix

* Fix

* typing fixes

* make sure the batch dataset patcher for multipack is always loaded when handling datasets

* review comments

* fix

---------

Co-authored-by: Dan Saunders <dan@axolotl.ai>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119 skip over rows in pretraining dataset (#2223)
* skip over rows in pretraining dataset

* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3 assume empty lora dropout means 0.0 and add tests (#2243)
* assume empty lora dropout means 0.0 and add tests

* remove un-necessary arg

* refactor based on pr feedback:

* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f add helper to verify the correct model output file exists (#2245)
* add helper to verify the correct model output file exists

* more checks using helper

* chore: lint

* fix import and relora model check

* workaround for trl trainer saves

* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
d8b4027200 use 2.5.1 docker images as latest tag as it seems stable (#2198) 2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c rename liger test so it properly runs in ci (#2246) 2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e feat: add support for data_files in pretraining (#2238) 2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4 update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:

* bump datasets, tokenizer, trl

* remove log workarounds in trl

* bump lm-eval

* remove unsloth_ import from critical path

* remove llama fa2 from conftest

* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170 Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) 2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab update modal version for ci (#2242) 2025-01-09 21:01:02 +00:00