Wing Lian
|
06370b386a
|
move more things to kd plugin
|
2025-01-09 18:57:26 -05:00 |
|
Wing Lian
|
3da6a652fa
|
refactor kd chat template loader
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
84547c724d
|
support for custom trainer classes from plugins
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
51547c656a
|
handle token/logprob shifting
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
7c4ae15942
|
remove references to triton kd for now
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
cdb167e7f7
|
add license block
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
52f1d7aee2
|
refactor so we can easily add new loss functions
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
319c3531e7
|
chore: lint
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
87eb6a3324
|
var naming and add todo
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
f03fa703b7
|
fix kd loss so it's causal (fixes repeating tokens)
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
53ec07d44c
|
use kd_alpha in the correct loss method
|
2025-01-09 18:57:25 -05:00 |
|
Wing Lian
|
8d77dc385e
|
hash for temperature too
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
8b0104fa7c
|
better rescaling for temperatures
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
546ad007ec
|
don't use triton for now
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
868a49cb96
|
fix kwarg
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
4a12b1b22e
|
v3
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
973ed841cd
|
no torch.tensor
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
9c0470130b
|
no log etc
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
0da2b7c7cc
|
no torch.exp inside triton kernel
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
7c813a1d27
|
v2 trial
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
0a08bb4f78
|
no where support
|
2025-01-09 18:57:24 -05:00 |
|
Wing Lian
|
8075a92a33
|
triton wip
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
ba6eacd167
|
chore: lint
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
e2fae47114
|
make sure to multiply against the correct loss
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
7d281b71dc
|
cross entropy loss coefficient during KD
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
b080c53afc
|
flipped the slice
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
1ea225129f
|
make it work
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
e2aba41939
|
handle padding/collation for KD datasets
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
21caaaa2e9
|
make batch smaller
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
08d9f582e4
|
filter bad rows
|
2025-01-09 18:57:23 -05:00 |
|
Wing Lian
|
39daeb2c79
|
KD dataset loading and KD with logprobs
|
2025-01-09 18:57:22 -05:00 |
|
Wing Lian
|
02c9898a95
|
refactor trainer to prevent circular dependencies later
fix loader default
|
2025-01-09 18:57:19 -05:00 |
|
Wing Lian
|
fb3352e21c
|
rename liger test so it properly runs in ci (#2246)
|
2025-01-09 17:31:43 -05:00 |
|
NanoCode012
|
ed77e7001e
|
feat: add support for data_files in pretraining (#2238)
|
2025-01-09 21:04:13 +00:00 |
|
Wing Lian
|
7669a03fb4
|
update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:
* bump datasets, tokenizer, trl
* remove log workarounds in trl
* bump lm-eval
* remove unsloth_ import from critical path
* remove llama fa2 from conftest
* unsloth breaks with latest upstream
|
2025-01-09 21:01:59 +00:00 |
|
Vincenzo di Cicco
|
6553683170
|
Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235)
|
2025-01-09 21:01:22 +00:00 |
|
Wing Lian
|
5e0124e2ab
|
update modal version for ci (#2242)
|
2025-01-09 21:01:02 +00:00 |
|
NanoCode012
|
2e8d7c1adb
|
fix: mistral nemo does not recognize token_type_ids in forward (#2233)
|
2025-01-09 21:00:36 +00:00 |
|
Wing Lian
|
3c1921e400
|
add hf cache caching for GHA (#2247)
* add hf cache caching for GHA
* use modal volume to cache hf data
* make sure to update the cache as we add new fixtures in conftest
|
2025-01-09 20:59:54 +00:00 |
|
Wing Lian
|
7faf2b6e8e
|
Merge group queue (#2248)
* add support for merge groups
* also lint merge groups
|
2025-01-09 15:49:00 -05:00 |
|
salman
|
c1b920f291
|
Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps
* updating pylintrc
* fixing linters
* reverting changes
|
2025-01-07 13:42:01 +00:00 |
|
Wing Lian
|
3915abee4c
|
make sure padding is labeled as -100 for pretraining (#2227)
|
2024-12-31 15:22:18 -05:00 |
|
NJordan72
|
7a38dbe674
|
fix: allow trainer builder to use custom jinja chat template (#2219)
* fix: allow trainer builder to use custom jinja chat template
* chore: use get_chat_template_from_config
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
* fix: swap imports
---------
Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
|
2024-12-24 16:18:50 -05:00 |
|
Wing Lian
|
e0a2eb2ebd
|
fix untrained tokens if specified explicitly from a list (#2210)
|
2024-12-23 09:08:28 -05:00 |
|
Wing Lian
|
d852d7af7a
|
inference - don't default w accelerate, fix base model (#2216) [skip ci]
|
2024-12-23 07:48:41 -05:00 |
|
Wing Lian
|
3742deb1de
|
add deepspeed example with torch compile enabled (#2212) [skip ci]
|
2024-12-22 12:11:39 -05:00 |
|
Wing Lian
|
2312caaa98
|
GC every n steps (#2209)
|
2024-12-21 17:38:33 -05:00 |
|
Wing Lian
|
307cf7c685
|
move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204)
|
2024-12-20 21:43:52 -05:00 |
|
Dan Saunders
|
70541145f1
|
adding test_datasets compat with pretraining_dataset (streaming) (#2206) [skip ci]
|
2024-12-20 21:43:33 -05:00 |
|
Wing Lian
|
42bd32a233
|
add outputs (symlink) to gitignore [skip ci] (#2205)
|
2024-12-19 20:14:43 -05:00 |
|