Commit Graph

  • 390cb5742e removing extra pytest xdist args Dan Saunders 2024-12-19 02:51:39 +00:00
  • 1d935f65c3 moving tests around for flash_attn install Dan Saunders 2024-12-18 19:36:23 +00:00
  • 66176b3e07 adding split_heads argument for retaining original (Q, K) dimensionanlity Dan Saunders 2024-12-18 05:56:29 +00:00
  • 505321ac95 isolating problematic test Dan Saunders 2024-12-18 03:30:35 +00:00
  • 0b382c88da fixes post-rebase Dan Saunders 2024-12-18 01:38:56 +00:00
  • ea07a7086e plugin implementation Dan Saunders 2024-12-18 01:26:41 +00:00
  • d22e1136bc convert-differential-transformer test coverage Dan Saunders 2024-12-17 20:46:19 +00:00
  • 63b8e42c6b duplicate code ignore Dan Saunders 2024-12-17 18:54:49 +00:00
  • bda1eed59e differential flash attention 2; cleanup Dan Saunders 2024-12-17 18:44:47 +00:00
  • 41ebd93158 moving monkeypatch Dan Saunders 2024-12-17 14:12:03 +00:00
  • 4c050ce807 pre-commit fix Dan Saunders 2024-12-17 13:52:34 +00:00
  • 6665acf63d fix model save / load logic Dan Saunders 2024-12-17 04:43:08 +00:00
  • 2f9fa4c465 various improvemnents Dan Saunders 2024-12-13 15:17:52 -05:00
  • 849bc94112 various improvemnents Dan Saunders 2024-12-13 15:03:45 -05:00
  • e484ec778d training fixes, patching, minor cleanup Dan Saunders 2024-12-13 00:06:22 -05:00
  • df1504ae14 adding CLI command for convert-diff-transformer Dan Saunders 2024-12-11 23:11:19 -05:00
  • 7be0d7496c Adding script for doing conversion; fixes and updates Dan Saunders 2024-12-11 21:35:47 -05:00
  • 13cdffa91f initial diff attn layer / model conversion implementation (support for llama arch) Dan Saunders 2024-12-11 14:51:53 -05:00
  • 7a4b296f60 Basic evaluate CLI command / codepath (#2188) Dan Saunders 2024-12-16 15:46:31 -05:00
  • 777db9b7e5 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-10 13:36:17 +00:00
  • d8b4027200 use 2.5.1 docker images as latest tag as it seems stable (#2198) Wing Lian 2025-01-10 08:35:25 -05:00
  • ab491804e0 chore: lint kd-trainer-rebased Wing Lian 2025-01-10 02:18:55 -05:00
  • f7334a1719 make sure to use tensorboard to capture loss for checks Wing Lian 2025-01-08 22:30:13 -05:00
  • c45ab03487 fix adapter model check Wing Lian 2025-01-08 20:15:42 -05:00
  • 0da0cd02e5 make sure to use the correct tokenizer Wing Lian 2025-01-08 17:54:48 -05:00
  • dd48ce7365 make sure to set tokenizer from l3 70b and save safetensors Wing Lian 2025-01-08 14:05:49 -05:00
  • 6fbc35762b lower lr Wing Lian 2025-01-08 13:45:12 -05:00
  • 71cb5b98c9 set lora_dropout explicitly Wing Lian 2025-01-08 12:14:16 -05:00
  • 890d85f267 make the kd e2e fit in vram for ci and add lora version Wing Lian 2025-01-08 11:07:29 -05:00
  • 7dc137ed5b rename test files so it gets picked up Wing Lian 2025-01-08 09:21:17 -05:00
  • a31ec4d9b3 linting Wing Lian 2025-01-08 08:31:28 -05:00
  • 7e7762f40b add kd trainer e2e test Wing Lian 2025-01-08 08:19:10 -05:00
  • 1ffca753ca reward model doesn't work well with batched Wing Lian 2025-01-07 18:19:42 -05:00
  • 01d31587fe improve check for batched Wing Lian 2025-01-07 16:57:47 -05:00
  • 9b7d3894c0 fix reward trainer calls for tokenization Wing Lian 2025-01-07 15:41:40 -05:00
  • 1baffa54b1 reward can use same batch check Wing Lian 2025-01-07 15:11:07 -05:00
  • 2045ff2b7a tweak check for batched prompt data Wing Lian 2025-01-07 14:54:32 -05:00
  • 93903f4aa5 ensure that batch vs single is done properly Wing Lian 2025-01-07 14:03:49 -05:00
  • b5b3452b2b improve iterable support Wing Lian 2025-01-02 13:50:35 -05:00
  • 6bbe3ac641 support streaming for processing sft datasts? Wing Lian 2025-01-01 09:11:14 -05:00
  • 9ed455ef8c make loss torch script compat Wing Lian 2024-12-30 21:34:46 -05:00
  • 66823c113c kd sample packing Wing Lian 2024-12-30 20:10:47 -05:00
  • e976de4d8f be a bit pickier about loading dynamic prompt strategies Wing Lian 2024-12-30 16:52:41 -05:00
  • 8eb82bba40 more info on preprocess for kd and fix import Wing Lian 2024-12-30 15:58:02 -05:00
  • 9fe36db215 remove duplicate code Wing Lian 2024-12-30 14:16:33 -05:00
  • 9dcc879e04 add copyrights Wing Lian 2024-12-30 14:12:02 -05:00
  • 1e577a29a8 increase logging around loading plugins Wing Lian 2024-12-30 13:33:56 -05:00
  • 4037fdb43a make plugin setup concise Wing Lian 2024-12-30 13:25:25 -05:00
  • 385c60cd9b remove moved class from import Wing Lian 2024-12-30 13:17:11 -05:00
  • 06370b386a move more things to kd plugin Wing Lian 2024-12-30 13:15:28 -05:00
  • 3da6a652fa refactor kd chat template loader Wing Lian 2024-12-30 12:57:11 -05:00
  • 84547c724d support for custom trainer classes from plugins Wing Lian 2024-12-30 12:20:45 -05:00
  • 51547c656a handle token/logprob shifting Wing Lian 2024-12-30 11:21:19 -05:00
  • 7c4ae15942 remove references to triton kd for now Wing Lian 2024-12-30 10:40:05 -05:00
  • cdb167e7f7 add license block Wing Lian 2024-12-29 16:18:05 -05:00
  • 52f1d7aee2 refactor so we can easily add new loss functions Wing Lian 2024-12-29 16:15:47 -05:00
  • 319c3531e7 chore: lint Wing Lian 2024-12-28 16:02:06 -05:00
  • 87eb6a3324 var naming and add todo Wing Lian 2024-12-25 21:41:06 -05:00
  • f03fa703b7 fix kd loss so it's causal (fixes repeating tokens) Wing Lian 2024-12-25 18:59:30 -05:00
  • 53ec07d44c use kd_alpha in the correct loss method Wing Lian 2024-12-24 19:54:32 -05:00
  • 8d77dc385e hash for temperature too Wing Lian 2024-12-24 15:48:35 -05:00
  • 8b0104fa7c better rescaling for temperatures Wing Lian 2024-12-24 09:26:27 -05:00
  • 546ad007ec don't use triton for now Wing Lian 2024-12-21 16:47:11 -05:00
  • 868a49cb96 fix kwarg Wing Lian 2024-12-21 14:32:11 -05:00
  • 4a12b1b22e v3 Wing Lian 2024-12-21 14:17:30 -05:00
  • 973ed841cd no torch.tensor Wing Lian 2024-12-21 14:00:01 -05:00
  • 9c0470130b no log etc Wing Lian 2024-12-21 13:54:21 -05:00
  • 0da2b7c7cc no torch.exp inside triton kernel Wing Lian 2024-12-21 13:52:31 -05:00
  • 7c813a1d27 v2 trial Wing Lian 2024-12-21 13:43:48 -05:00
  • 0a08bb4f78 no where support Wing Lian 2024-12-21 13:21:54 -05:00
  • 8075a92a33 triton wip Wing Lian 2024-12-21 13:18:23 -05:00
  • ba6eacd167 chore: lint Wing Lian 2024-12-19 02:04:25 -05:00
  • e2fae47114 make sure to multiply against the correct loss Wing Lian 2024-12-19 01:42:57 -05:00
  • 7d281b71dc cross entropy loss coefficient during KD Wing Lian 2024-12-19 01:42:21 -05:00
  • b080c53afc flipped the slice Wing Lian 2024-12-19 01:21:48 -05:00
  • 1ea225129f make it work Wing Lian 2024-12-19 00:28:02 -05:00
  • e2aba41939 handle padding/collation for KD datasets Wing Lian 2024-12-18 18:07:27 -05:00
  • 21caaaa2e9 make batch smaller Wing Lian 2024-12-18 16:23:50 -05:00
  • 08d9f582e4 filter bad rows Wing Lian 2024-12-18 15:47:18 -05:00
  • 39daeb2c79 KD dataset loading and KD with logprobs Wing Lian 2024-12-18 15:16:45 -05:00
  • 02c9898a95 refactor trainer to prevent circular dependencies later Wing Lian 2024-12-16 14:16:36 -05:00
  • a319670c13 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-09 22:32:40 +00:00
  • fb3352e21c rename liger test so it properly runs in ci (#2246) Wing Lian 2025-01-09 17:31:43 -05:00
  • 543daaf46f llama test Sunny 2025-01-09 16:08:24 -05:00
  • 610214db5c Built site for gh-pages Quarto GHA Workflow Runner 2025-01-09 21:05:22 +00:00
  • ed77e7001e feat: add support for data_files in pretraining (#2238) NanoCode012 2025-01-10 04:04:13 +07:00
  • 3ae2066e23 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-09 21:02:35 +00:00
  • 7669a03fb4 update upstream HF deps (#2239) Wing Lian 2025-01-09 16:01:59 -05:00
  • 75b02acde8 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-09 21:01:49 +00:00
  • 6553683170 Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) Vincenzo di Cicco 2025-01-09 22:01:22 +01:00
  • 5e0124e2ab update modal version for ci (#2242) Wing Lian 2025-01-09 16:01:02 -05:00
  • 2e8d7c1adb fix: mistral nemo does not recognize token_type_ids in forward (#2233) NanoCode012 2025-01-10 04:00:36 +07:00
  • 3c1921e400 add hf cache caching for GHA (#2247) Wing Lian 2025-01-09 15:59:54 -05:00
  • 83c01532f1 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-09 20:49:57 +00:00
  • 7faf2b6e8e Merge group queue (#2248) Wing Lian 2025-01-09 15:49:00 -05:00
  • 59047ee6c4 dump snapshot location for caching debug-hf-home-cache Wing Lian 2025-01-09 11:26:33 -05:00
  • 5c226b600d pr feedback Wing Lian 2025-01-08 08:38:06 -05:00
  • af66f7c274 update link in README to include utm Wing Lian 2025-01-07 15:13:18 -05:00
  • 079f94ee99 include modal in requirements Wing Lian 2025-01-07 08:48:25 -05:00
  • 3ac34fbede Built site for gh-pages Quarto GHA Workflow Runner 2025-01-07 13:43:02 +00:00