Wing Lian
8aafe142f2
refactor kd chat template loader
2025-01-13 13:41:35 -05:00
Wing Lian
a0d6d8895e
support for custom trainer classes from plugins
2025-01-13 13:41:34 -05:00
Wing Lian
55b33cc44d
handle token/logprob shifting
2025-01-13 13:41:34 -05:00
Wing Lian
69ed25e82c
remove references to triton kd for now
2025-01-13 13:41:34 -05:00
Wing Lian
2ea8b7e518
add license block
2025-01-13 13:41:34 -05:00
Wing Lian
aa081e0e76
refactor so we can easily add new loss functions
2025-01-13 13:41:34 -05:00
Wing Lian
3f97ec45fb
chore: lint
2025-01-13 13:41:34 -05:00
Wing Lian
7b5a24b0d2
var naming and add todo
2025-01-13 13:41:34 -05:00
Wing Lian
4ddd089d0a
fix kd loss so it's causal (fixes repeating tokens)
2025-01-13 13:41:34 -05:00
Wing Lian
b88128d067
use kd_alpha in the correct loss method
2025-01-13 13:41:32 -05:00
Wing Lian
2e6422a711
hash for temperature too
2025-01-13 13:40:19 -05:00
Wing Lian
6ad809287b
better rescaling for temperatures
2025-01-13 13:40:19 -05:00
Wing Lian
e376e00386
don't use triton for now
2025-01-13 13:40:19 -05:00
Wing Lian
23d7ae6caa
fix kwarg
2025-01-13 13:40:19 -05:00
Wing Lian
19638590d5
v3
2025-01-13 13:40:18 -05:00
Wing Lian
73f5b83431
no torch.tensor
2025-01-13 13:40:18 -05:00
Wing Lian
9b1164b841
no log etc
2025-01-13 13:40:18 -05:00
Wing Lian
5a7d6f6175
no torch.exp inside triton kernel
2025-01-13 13:40:18 -05:00
Wing Lian
a803c3d3ee
v2 trial
2025-01-13 13:40:18 -05:00
Wing Lian
48ccf55752
no where support
2025-01-13 13:40:18 -05:00
Wing Lian
bc3326a808
triton wip
2025-01-13 13:40:18 -05:00
Wing Lian
cf8174db75
chore: lint
2025-01-13 13:40:18 -05:00
Wing Lian
222dc27410
make sure to multiply against the correct loss
2025-01-13 13:40:18 -05:00
Wing Lian
1107f1f603
cross entropy loss coefficient during KD
2025-01-13 13:40:18 -05:00
Wing Lian
1c603da96a
flipped the slice
2025-01-13 13:40:17 -05:00
Wing Lian
283faf3909
make it work
2025-01-13 13:40:17 -05:00
Wing Lian
472f7048e5
handle padding/collation for KD datasets
2025-01-13 13:40:17 -05:00
Wing Lian
3d1e2dcef4
make batch smaller
2025-01-13 13:40:17 -05:00
Wing Lian
9e218fbcfd
filter bad rows
2025-01-13 13:40:17 -05:00
Wing Lian
11caf52529
KD dataset loading and KD with logprobs
2025-01-13 13:40:17 -05:00
Wing Lian
17ba9dcfdb
refactor trainer to prevent circular dependencies later
...
fix loader default
2025-01-13 13:40:17 -05:00
Dan Saunders
1ed4de73b6
CLI cleanup and documentation ( #2244 )
...
* CLI init refactor
* fix
* cleanup and (partial) docs
* Adding documentation and continuing cleanup (in progress)
* remove finetune.py script
* continued cleanup and documentation
* pytest fixes
* review comments
* fix
* Fix
* typing fixes
* make sure the batch dataset patcher for multipack is always loaded when handling datasets
* review comments
* fix
---------
Co-authored-by: Dan Saunders <dan@axolotl.ai >
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-01-13 17:55:29 +00:00
Wing Lian
f89e962119
skip over rows in pretraining dataset ( #2223 )
...
* skip over rows in pretraining dataset
* update docs
2025-01-13 10:44:45 -05:00
Wing Lian
bc1c9c20e3
assume empty lora dropout means 0.0 and add tests ( #2243 )
...
* assume empty lora dropout means 0.0 and add tests
* remove un-necessary arg
* refactor based on pr feedback:
* chore: lint
2025-01-13 10:44:11 -05:00
Wing Lian
dd26cc3c0f
add helper to verify the correct model output file exists ( #2245 )
...
* add helper to verify the correct model output file exists
* more checks using helper
* chore: lint
* fix import and relora model check
* workaround for trl trainer saves
* remove stray print
2025-01-13 10:43:29 -05:00
Wing Lian
d8b4027200
use 2.5.1 docker images as latest tag as it seems stable ( #2198 )
2025-01-10 08:35:25 -05:00
Wing Lian
fb3352e21c
rename liger test so it properly runs in ci ( #2246 )
2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e
feat: add support for data_files in pretraining ( #2238 )
2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4
update upstream HF deps ( #2239 )
...
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170
Use SequentialSampler if curriculum_sampling is enabled with sample_packing ( #2235 )
2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab
update modal version for ci ( #2242 )
2025-01-09 21:01:02 +00:00
NanoCode012
2e8d7c1adb
fix: mistral nemo does not recognize token_type_ids in forward ( #2233 )
2025-01-09 21:00:36 +00:00
Wing Lian
3c1921e400
add hf cache caching for GHA ( #2247 )
...
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
2025-01-09 20:59:54 +00:00
Wing Lian
7faf2b6e8e
Merge group queue ( #2248 )
...
* add support for merge groups
* also lint merge groups
2025-01-09 15:49:00 -05:00
salman
c1b920f291
Fixing OSX installation ( #2231 )
...
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
2025-01-07 13:42:01 +00:00
Wing Lian
3915abee4c
make sure padding is labeled as -100 for pretraining ( #2227 )
2024-12-31 15:22:18 -05:00
NJordan72
7a38dbe674
fix: allow trainer builder to use custom jinja chat template ( #2219 )
...
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com >
2024-12-24 16:18:50 -05:00
Wing Lian
e0a2eb2ebd
fix untrained tokens if specified explicitly from a list ( #2210 )
2024-12-23 09:08:28 -05:00
Wing Lian
d852d7af7a
inference - don't default w accelerate, fix base model ( #2216 ) [skip ci]
2024-12-23 07:48:41 -05:00
Wing Lian
3742deb1de
add deepspeed example with torch compile enabled ( #2212 ) [skip ci]
2024-12-22 12:11:39 -05:00