Commit Graph

  • e67e4191d1 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-29 04:24:41 +00:00
  • c071a530f7 removing 2.3.1 (#2294) salman 2025-01-29 04:23:44 +00:00
  • c015a76a23 Num epochs float (#2282) [skip ci] mashdragon 2025-01-29 04:23:26 +00:00
  • 067b442596 chore: refactor SaveModelCallback to stop handle fractional save_steps (#2291) [skip ci] NanoCode012 2025-01-29 11:22:10 +07:00
  • 0b52f06227 bump bnb to 0.45.1 (#2289) [skip ci] Wing Lian 2025-01-28 23:21:25 -05:00
  • 42d4732aaf kd loss needs to be calculated in full precision Wing Lian 2025-01-28 19:40:35 -05:00
  • b31796a681 Merge branch 'main' into flx_attn_support Sung Ching Liu 2025-01-28 14:20:43 -05:00
  • 4d1553e53f updates autodoc Dan Saunders 2025-01-27 15:43:51 -05:00
  • 1cfb8feb2d add iterable argument to preprocess-cli iterable-optional Wing Lian 2025-01-27 14:31:12 -05:00
  • 2c9dfbed2e apply z-score scaling to kd Wing Lian 2025-01-27 14:27:35 -05:00
  • f866157b74 initial quartodoc changes Dan Saunders 2025-01-27 18:57:45 +00:00
  • 2daa94080c Merge branch 'main' into diff-transformer diff-transformer Dan Saunders 2025-01-27 14:46:17 +00:00
  • 0e9bfa6dee small fixes, improvements Dan Saunders 2025-01-24 19:53:54 +00:00
  • 791c38dcc3 chore: lint relaxed-recursive-transformers Wing Lian 2025-01-24 13:29:54 -05:00
  • 0af78a9882 rescale the norm for lora Wing Lian 2025-01-22 10:02:29 -05:00
  • fa5efbf235 don't scale delta before decomposing Wing Lian 2025-01-21 15:45:21 -05:00
  • 59a7ac427d make sure to scale too Wing Lian 2025-01-21 13:22:49 -05:00
  • e3393042e5 hopefully fix the lora/dora logic Wing Lian 2025-01-21 12:30:00 -05:00
  • 08a4e8a7fb refactor a bit Wing Lian 2025-01-21 10:14:16 -05:00
  • b582d340b0 save tokenizer too Wing Lian 2025-01-20 13:31:26 -05:00
  • 474ba1a1b8 chore: lint/formatting Wing Lian 2025-01-20 12:20:20 -05:00
  • de771fcb05 fix convert logger and registration Wing Lian 2025-01-20 12:18:41 -05:00
  • f32d429db5 fix import path to args Wing Lian 2025-01-20 12:12:23 -05:00
  • 82005f8eeb auto modeling for rrt Wing Lian 2025-01-20 11:59:23 -05:00
  • b439ed3345 support optional dora Wing Lian 2025-01-20 11:45:06 -05:00
  • 623eaca740 more fixes to conversion Wing Lian 2025-01-20 10:51:24 -05:00
  • 38dfd3fadb wip conversion cli Wing Lian 2025-01-19 22:19:33 -05:00
  • daa9408233 more wip Wing Lian 2025-01-19 20:11:26 -05:00
  • 257231ac46 wip rrt Wing Lian 2025-01-17 08:48:45 -05:00
  • ef38f10274 merging into main Dan Saunders 2025-01-24 18:03:27 +00:00
  • 033003e88b Built site for gh-pages Quarto GHA Workflow Runner 2025-01-24 17:57:18 +00:00
  • 887513285d support for custom lr groups for non-embedding modules (#2213) Wing Lian 2025-01-24 12:56:28 -05:00
  • 76c7834c00 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-24 17:56:14 +00:00
  • 20620771f1 Pretrain multipack (#2278) Wing Lian 2025-01-24 12:55:20 -05:00
  • 66262c3092 moving out all diff attn code to plugin repo Dan Saunders 2025-01-24 17:46:11 +00:00
  • 6c49083d8b improve check for base case eos-hell Wing Lian 2025-01-24 12:02:34 -05:00
  • 94c226edb3 fixes last eos token not in labels on basic use case Wing Lian 2025-01-24 12:00:06 -05:00
  • e57399e40c Built site for gh-pages Quarto GHA Workflow Runner 2025-01-24 15:08:03 +00:00
  • 6086162488 chore(doc): improve explanation for *_steps and *_strategy (#2270) NanoCode012 2025-01-24 22:07:02 +07:00
  • 9d42e2660a Built site for gh-pages Quarto GHA Workflow Runner 2025-01-24 15:06:53 +00:00
  • b2774af66c Take split param from config in all load_dataset instances (#2281) mashdragon 2025-01-24 15:06:50 +00:00
  • 74f9782fc3 chore(doc): fix explanation on gcs creds retrieval (#2272) NanoCode012 2025-01-24 22:05:58 +07:00
  • 14325ed158 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-24 02:18:59 +00:00
  • 8a7a0b07dc support for latest transformers release 4.48.1 (#2256) Wing Lian 2025-01-23 21:17:57 -05:00
  • 5ca57cb55a undo bool conversion Sunny Liu 2025-01-23 17:56:13 -05:00
  • 016ba124e4 README update Dan Saunders 2025-01-23 22:11:35 +00:00
  • 7145d52d99 moving diff attn code to separate repo Dan Saunders 2025-01-23 21:33:53 +00:00
  • 0149de7fb0 mask to bool Sunny Liu 2025-01-23 15:30:08 -05:00
  • 8c34c65181 dummy Sunny Liu 2025-01-23 14:56:26 -05:00
  • 555aa5772a skip mask conversion if already 4d Sunny Liu 2025-01-23 14:01:53 -05:00
  • e8b2789086 revert mask expand Sunny Liu 2025-01-23 11:20:38 -05:00
  • 85752cdfc9 mask expansion Sunny Liu 2025-01-22 21:33:38 -05:00
  • f2f23c8041 mask expansion Sunny Liu 2025-01-22 21:31:42 -05:00
  • 8b3eec7f6e mask expansion Sunny Liu 2025-01-22 21:29:52 -05:00
  • bb9bea3110 mask expansion Sunny Liu 2025-01-22 21:27:25 -05:00
  • 0dd18a3681 llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 21:10:05 -05:00
  • 152e988d3c llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 21:02:26 -05:00
  • 27532825a9 llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 21:00:34 -05:00
  • 06f83a54a5 llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 20:45:44 -05:00
  • d7b133dc1f llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 20:33:13 -05:00
  • f3bec17917 llama sdpa patching WIP - static class function import Sunny Liu 2025-01-22 20:25:26 -05:00
  • b7deb5241c llama sdpa patching WIP Sunny Liu 2025-01-22 20:16:27 -05:00
  • cee310dcfa llama sdpa patching WIP Sunny Liu 2025-01-22 20:15:23 -05:00
  • d1be6e228d llama sdpa patching WIP Sunny Liu 2025-01-22 20:14:20 -05:00
  • 5f9f77f384 llama patch Sunny Liu 2025-01-22 11:29:28 -05:00
  • e54dba515a Built site for gh-pages Quarto GHA Workflow Runner 2025-01-21 20:40:43 +00:00
  • e6492e5826 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-21 20:40:10 +00:00
  • 8fb72cbc0b use the extracted field_messages to parse the role fields (#2265) Wing Lian 2025-01-21 15:39:30 -05:00
  • bb9d4102c4 Add 5000 line history limit to tmux for docker cloud (#2268) Adithya Kamath 2025-01-22 02:09:17 +05:30
  • 4e4a16cd8a fix finding the top-k rather than assuming first position has the correct val Wing Lian 2025-01-21 13:09:20 -05:00
  • 5e8c492e3c trainer refactor testing for hf#35567 hf-trainer-refactor Wing Lian 2025-01-21 11:27:10 -05:00
  • 67c1c8405e use iter instead of tuple Wing Lian 2025-01-21 11:23:38 -05:00
  • bded6df509 change up logic so we always truncate to top_k Wing Lian 2025-01-21 11:20:01 -05:00
  • bb5e6f4b72 make sure to truncate logprobs if there are more than top_k Wing Lian 2025-01-21 10:26:27 -05:00
  • 9a683536c8 upgrade accelerate also Wing Lian 2025-01-21 10:15:16 -05:00
  • faa61a9c3e use official hf release for 4.48.1 Wing Lian 2025-01-20 11:46:09 -05:00
  • 59cb36564d skip check for latest transformers Wing Lian 2025-01-16 10:38:40 -05:00
  • 50d4d727a0 use wip branch for expected 4.48.1 Wing Lian 2025-01-16 09:08:18 -05:00
  • 0714a49227 move relora test so it runs in a single test thread Wing Lian 2025-01-13 21:38:26 -05:00
  • b6daffb788 fix import from mv Wing Lian 2025-01-13 20:32:23 -05:00
  • d487e377fa move relora to the patched tests suite Wing Lian 2025-01-13 19:54:40 -05:00
  • 4cc89f73f0 fix patch Wing Lian 2025-01-13 14:02:27 -05:00
  • 5b5ba49c46 latest fixes needed for GA in latest transformers Wing Lian 2025-01-13 13:36:47 -05:00
  • 49b5501fc2 unsloth incompatible with latest transformers Wing Lian 2025-01-13 12:10:20 -05:00
  • 23389b38b7 bump to latest transformers release Wing Lian 2025-01-13 10:34:44 -05:00
  • b2a34380b3 sample packing doc mask creation WIP bursteratom 2025-01-21 09:18:38 -05:00
  • c5554d774f Built site for gh-pages Quarto GHA Workflow Runner 2025-01-20 19:08:25 +00:00
  • af727eedf7 option to not concatenate during pretraining (#2263) Wing Lian 2025-01-20 14:07:34 -05:00
  • 80bfc50d1f get seqlens from position ids for foc masking Sunny Liu 2025-01-17 17:22:04 -05:00
  • a5360c172c llama hijacking Sunny Liu 2025-01-17 15:54:03 -05:00
  • 013a9b73fc fix transformers version for testing Sunny Liu 2025-01-16 15:32:57 -05:00
  • aad62428e0 not sure if this is necessary actually Sunny 2025-01-16 15:08:34 -05:00
  • 7bf9741831 use the extracted field_messages to parse the role fields chat-dataset-tool Wing Lian 2025-01-16 08:36:00 -05:00
  • 8c4f89745a fix softmax class check rala-v2 Wing Lian 2025-01-15 23:23:13 -05:00
  • 36b71f34d7 register rala Wing Lian 2025-01-15 23:21:22 -05:00
  • d28fee7609 use autoconfig w rala Wing Lian 2025-01-15 23:14:47 -05:00
  • c196776996 option to not concatenate during pretraining Wing Lian 2025-01-15 22:45:02 -05:00
  • 79ae776102 fixup logging layer Wing Lian 2025-01-15 21:36:14 -05:00
  • 145664d82c more fixups Wing Lian 2025-01-15 21:27:12 -05:00
  • a6f2c5d583 flex sample packing WIP Sunny 2025-01-15 21:12:33 -05:00