Commit Graph

1837 Commits

Author SHA1 Message Date
Wing Lian
06370b386a move more things to kd plugin 2025-01-09 18:57:26 -05:00
Wing Lian
3da6a652fa refactor kd chat template loader 2025-01-09 18:57:25 -05:00
Wing Lian
84547c724d support for custom trainer classes from plugins 2025-01-09 18:57:25 -05:00
Wing Lian
51547c656a handle token/logprob shifting 2025-01-09 18:57:25 -05:00
Wing Lian
7c4ae15942 remove references to triton kd for now 2025-01-09 18:57:25 -05:00
Wing Lian
cdb167e7f7 add license block 2025-01-09 18:57:25 -05:00
Wing Lian
52f1d7aee2 refactor so we can easily add new loss functions 2025-01-09 18:57:25 -05:00
Wing Lian
319c3531e7 chore: lint 2025-01-09 18:57:25 -05:00
Wing Lian
87eb6a3324 var naming and add todo 2025-01-09 18:57:25 -05:00
Wing Lian
f03fa703b7 fix kd loss so it's causal (fixes repeating tokens) 2025-01-09 18:57:25 -05:00
Wing Lian
53ec07d44c use kd_alpha in the correct loss method 2025-01-09 18:57:25 -05:00
Wing Lian
8d77dc385e hash for temperature too 2025-01-09 18:57:24 -05:00
Wing Lian
8b0104fa7c better rescaling for temperatures 2025-01-09 18:57:24 -05:00
Wing Lian
546ad007ec don't use triton for now 2025-01-09 18:57:24 -05:00
Wing Lian
868a49cb96 fix kwarg 2025-01-09 18:57:24 -05:00
Wing Lian
4a12b1b22e v3 2025-01-09 18:57:24 -05:00
Wing Lian
973ed841cd no torch.tensor 2025-01-09 18:57:24 -05:00
Wing Lian
9c0470130b no log etc 2025-01-09 18:57:24 -05:00
Wing Lian
0da2b7c7cc no torch.exp inside triton kernel 2025-01-09 18:57:24 -05:00
Wing Lian
7c813a1d27 v2 trial 2025-01-09 18:57:24 -05:00
Wing Lian
0a08bb4f78 no where support 2025-01-09 18:57:24 -05:00
Wing Lian
8075a92a33 triton wip 2025-01-09 18:57:23 -05:00
Wing Lian
ba6eacd167 chore: lint 2025-01-09 18:57:23 -05:00
Wing Lian
e2fae47114 make sure to multiply against the correct loss 2025-01-09 18:57:23 -05:00
Wing Lian
7d281b71dc cross entropy loss coefficient during KD 2025-01-09 18:57:23 -05:00
Wing Lian
b080c53afc flipped the slice 2025-01-09 18:57:23 -05:00
Wing Lian
1ea225129f make it work 2025-01-09 18:57:23 -05:00
Wing Lian
e2aba41939 handle padding/collation for KD datasets 2025-01-09 18:57:23 -05:00
Wing Lian
21caaaa2e9 make batch smaller 2025-01-09 18:57:23 -05:00
Wing Lian
08d9f582e4 filter bad rows 2025-01-09 18:57:23 -05:00
Wing Lian
39daeb2c79 KD dataset loading and KD with logprobs 2025-01-09 18:57:22 -05:00
Wing Lian
02c9898a95 refactor trainer to prevent circular dependencies later
fix loader default
2025-01-09 18:57:19 -05:00
Wing Lian
fb3352e21c rename liger test so it properly runs in ci (#2246) 2025-01-09 17:31:43 -05:00
NanoCode012
ed77e7001e feat: add support for data_files in pretraining (#2238) 2025-01-09 21:04:13 +00:00
Wing Lian
7669a03fb4 update upstream HF deps (#2239)
* bump axolotl contribs for upstream main conflicts:

* bump datasets, tokenizer, trl

* remove log workarounds in trl

* bump lm-eval

* remove unsloth_ import from critical path

* remove llama fa2 from conftest

* unsloth breaks with latest upstream
2025-01-09 21:01:59 +00:00
Vincenzo di Cicco
6553683170 Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) 2025-01-09 21:01:22 +00:00
Wing Lian
5e0124e2ab update modal version for ci (#2242) 2025-01-09 21:01:02 +00:00
NanoCode012
2e8d7c1adb fix: mistral nemo does not recognize token_type_ids in forward (#2233) 2025-01-09 21:00:36 +00:00
Wing Lian
3c1921e400 add hf cache caching for GHA (#2247)
* add hf cache caching for GHA

* use modal volume to cache hf data

* make sure to update the cache as we add new fixtures in conftest
2025-01-09 20:59:54 +00:00
Wing Lian
7faf2b6e8e Merge group queue (#2248)
* add support for merge groups

* also lint merge groups
2025-01-09 15:49:00 -05:00
salman
c1b920f291 Fixing OSX installation (#2231)
* bumping version, removing non-osx compatible deps

* updating pylintrc

* fixing linters

* reverting changes
2025-01-07 13:42:01 +00:00
Wing Lian
3915abee4c make sure padding is labeled as -100 for pretraining (#2227) 2024-12-31 15:22:18 -05:00
NJordan72
7a38dbe674 fix: allow trainer builder to use custom jinja chat template (#2219)
* fix: allow trainer builder to use custom jinja chat template

* chore: use get_chat_template_from_config

Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>

* fix: swap imports

---------

Co-authored-by: Chirag Jain <jain.chirag925@gmail.com>
2024-12-24 16:18:50 -05:00
Wing Lian
e0a2eb2ebd fix untrained tokens if specified explicitly from a list (#2210) 2024-12-23 09:08:28 -05:00
Wing Lian
d852d7af7a inference - don't default w accelerate, fix base model (#2216) [skip ci] 2024-12-23 07:48:41 -05:00
Wing Lian
3742deb1de add deepspeed example with torch compile enabled (#2212) [skip ci] 2024-12-22 12:11:39 -05:00
Wing Lian
2312caaa98 GC every n steps (#2209) 2024-12-21 17:38:33 -05:00
Wing Lian
307cf7c685 move the dataset loading from remote/disk to a shared function so we can re-use for RL (#2204) 2024-12-20 21:43:52 -05:00
Dan Saunders
70541145f1 adding test_datasets compat with pretraining_dataset (streaming) (#2206) [skip ci] 2024-12-20 21:43:33 -05:00
Wing Lian
42bd32a233 add outputs (symlink) to gitignore [skip ci] (#2205) 2024-12-19 20:14:43 -05:00