Commit Graph

  • 7cd0a317cb support streaming for processing sft datasts? Wing Lian 2025-01-01 09:11:14 -05:00
  • 1cc3a2d16c make loss torch script compat Wing Lian 2024-12-30 21:34:46 -05:00
  • 287d2ca8d5 kd sample packing Wing Lian 2024-12-30 20:10:47 -05:00
  • 03b86df506 be a bit pickier about loading dynamic prompt strategies Wing Lian 2024-12-30 16:52:41 -05:00
  • 2ed4246949 more info on preprocess for kd and fix import Wing Lian 2024-12-30 15:58:02 -05:00
  • 35bc2e2d3f remove duplicate code Wing Lian 2024-12-30 14:16:33 -05:00
  • 94f1094805 add copyrights Wing Lian 2024-12-30 14:12:02 -05:00
  • a0070bf94e increase logging around loading plugins Wing Lian 2024-12-30 13:33:56 -05:00
  • 2ee2ffd834 make plugin setup concise Wing Lian 2024-12-30 13:25:25 -05:00
  • 723b0a2dee remove moved class from import Wing Lian 2024-12-30 13:17:11 -05:00
  • 327739c9e3 move more things to kd plugin Wing Lian 2024-12-30 13:15:28 -05:00
  • 8aafe142f2 refactor kd chat template loader Wing Lian 2024-12-30 12:57:11 -05:00
  • a0d6d8895e support for custom trainer classes from plugins Wing Lian 2024-12-30 12:20:45 -05:00
  • 55b33cc44d handle token/logprob shifting Wing Lian 2024-12-30 11:21:19 -05:00
  • 69ed25e82c remove references to triton kd for now Wing Lian 2024-12-30 10:40:05 -05:00
  • 2ea8b7e518 add license block Wing Lian 2024-12-29 16:18:05 -05:00
  • aa081e0e76 refactor so we can easily add new loss functions Wing Lian 2024-12-29 16:15:47 -05:00
  • 3f97ec45fb chore: lint Wing Lian 2024-12-28 16:02:06 -05:00
  • 7b5a24b0d2 var naming and add todo Wing Lian 2024-12-25 21:41:06 -05:00
  • 4ddd089d0a fix kd loss so it's causal (fixes repeating tokens) Wing Lian 2024-12-25 18:59:30 -05:00
  • b88128d067 use kd_alpha in the correct loss method Wing Lian 2024-12-24 19:54:32 -05:00
  • 2e6422a711 hash for temperature too Wing Lian 2024-12-24 15:48:35 -05:00
  • 6ad809287b better rescaling for temperatures Wing Lian 2024-12-24 09:26:27 -05:00
  • e376e00386 don't use triton for now Wing Lian 2024-12-21 16:47:11 -05:00
  • 23d7ae6caa fix kwarg Wing Lian 2024-12-21 14:32:11 -05:00
  • 19638590d5 v3 Wing Lian 2024-12-21 14:17:30 -05:00
  • 73f5b83431 no torch.tensor Wing Lian 2024-12-21 14:00:01 -05:00
  • 9b1164b841 no log etc Wing Lian 2024-12-21 13:54:21 -05:00
  • 5a7d6f6175 no torch.exp inside triton kernel Wing Lian 2024-12-21 13:52:31 -05:00
  • a803c3d3ee v2 trial Wing Lian 2024-12-21 13:43:48 -05:00
  • 48ccf55752 no where support Wing Lian 2024-12-21 13:21:54 -05:00
  • bc3326a808 triton wip Wing Lian 2024-12-21 13:18:23 -05:00
  • cf8174db75 chore: lint Wing Lian 2024-12-19 02:04:25 -05:00
  • 222dc27410 make sure to multiply against the correct loss Wing Lian 2024-12-19 01:42:57 -05:00
  • 1107f1f603 cross entropy loss coefficient during KD Wing Lian 2024-12-19 01:42:21 -05:00
  • 1c603da96a flipped the slice Wing Lian 2024-12-19 01:21:48 -05:00
  • 283faf3909 make it work Wing Lian 2024-12-19 00:28:02 -05:00
  • 472f7048e5 handle padding/collation for KD datasets Wing Lian 2024-12-18 18:07:27 -05:00
  • 3d1e2dcef4 make batch smaller Wing Lian 2024-12-18 16:23:50 -05:00
  • 9e218fbcfd filter bad rows Wing Lian 2024-12-18 15:47:18 -05:00
  • 11caf52529 KD dataset loading and KD with logprobs Wing Lian 2024-12-18 15:16:45 -05:00
  • 17ba9dcfdb refactor trainer to prevent circular dependencies later Wing Lian 2024-12-16 14:16:36 -05:00
  • 716e133fb2 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-13 18:12:54 +00:00
  • 1ed4de73b6 CLI cleanup and documentation (#2244) Dan Saunders 2025-01-13 12:55:29 -05:00
  • 385736fae1 fix linter issue from merge fix-merge-lint-issue Wing Lian 2025-01-13 12:55:03 -05:00
  • a030dad657 fix cli-refactor Dan Saunders 2025-01-13 17:25:12 +00:00
  • 3b82fc36ec review comments Dan Saunders 2025-01-13 17:05:21 +00:00
  • 18a36b31ef make sure the batch dataset patcher for multipack is always loaded when handling datasets Wing Lian 2025-01-10 18:42:28 -05:00
  • 705e7dc270 typing fixes Dan Saunders 2025-01-10 17:48:28 +00:00
  • c9e37496cb Fix Dan Saunders 2025-01-10 17:32:33 +00:00
  • 210c58a4db fix Dan Saunders 2025-01-10 17:29:03 +00:00
  • 5ff1322f32 review comments Dan Saunders 2025-01-10 17:27:03 +00:00
  • 2b7b37413d pytest fixes Dan Saunders 2025-01-08 20:55:10 +00:00
  • 6e72baf287 continued cleanup and documentation Dan Saunders 2025-01-08 19:15:03 +00:00
  • 929ee15cc3 remove finetune.py script Dan Saunders 2025-01-07 20:17:49 +00:00
  • 773c3b51cd Adding documentation and continuing cleanup (in progress) Dan Saunders 2025-01-07 20:16:39 +00:00
  • 324c533adb cleanup and (partial) docs Dan Saunders 2025-01-07 17:59:59 +00:00
  • 6f80d1d670 fix Dan Saunders 2024-12-07 11:09:43 -05:00
  • 541f9b39ff CLI init refactor Dan Saunders 2024-12-06 11:57:53 -05:00
  • 1b26b6a2f3 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-13 15:46:05 +00:00
  • 34697fc683 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-13 15:45:08 +00:00
  • f89e962119 skip over rows in pretraining dataset (#2223) Wing Lian 2025-01-13 10:44:45 -05:00
  • c2e1907095 Built site for gh-pages Quarto GHA Workflow Runner 2025-01-13 15:44:26 +00:00
  • bc1c9c20e3 assume empty lora dropout means 0.0 and add tests (#2243) Wing Lian 2025-01-13 10:44:11 -05:00
  • dd26cc3c0f add helper to verify the correct model output file exists (#2245) Wing Lian 2025-01-13 10:43:29 -05:00
  • d3a0cb5edb transformers version bursteratom 2025-01-13 10:33:00 -05:00
  • 8b47e456b0 revert to transformers 4.47.1 bursteratom 2025-01-13 10:29:03 -05:00
  • 2319ac729c Merge branch 'main' into flx_attn_support Sunny Liu 2025-01-13 09:42:58 -05:00
  • ee20600b9a use alternate math-hard repo cli-cloud-modal-math-hard Wing Lian 2025-01-13 08:46:35 -05:00
  • fd91de3ea6 apply chat template as arg Wing Lian 2025-01-12 17:38:32 -05:00
  • f99cae0e7b llama test Sunny 2025-01-12 17:30:19 -05:00
  • 888cd9407f use 2.5.1 docker images as latest tag as it seems stable (#2198) Wing Lian 2025-01-10 08:35:25 -05:00
  • bd62d6e10a rename liger test so it properly runs in ci (#2246) Wing Lian 2025-01-09 17:31:43 -05:00
  • 5eae134110 feat: add support for data_files in pretraining (#2238) NanoCode012 2025-01-10 04:04:13 +07:00
  • b7d27bdfa4 update upstream HF deps (#2239) Wing Lian 2025-01-09 16:01:59 -05:00
  • da97a21bdc Use SequentialSampler if curriculum_sampling is enabled with sample_packing (#2235) Vincenzo di Cicco 2025-01-09 22:01:22 +01:00
  • e0d4b88598 update modal version for ci (#2242) Wing Lian 2025-01-09 16:01:02 -05:00
  • fac059a209 fix: mistral nemo does not recognize token_type_ids in forward (#2233) NanoCode012 2025-01-10 04:00:36 +07:00
  • 9c9ac1cf0b add hf cache caching for GHA (#2247) Wing Lian 2025-01-09 15:59:54 -05:00
  • 2346f21b2b Merge group queue (#2248) Wing Lian 2025-01-09 15:49:00 -05:00
  • 0b47281f51 Fixing OSX installation (#2231) salman 2025-01-07 13:42:01 +00:00
  • 530bf77cf9 revision support Wing Lian 2025-01-12 05:17:03 -05:00
  • bfc91a91ca use chat template Wing Lian 2025-01-11 23:18:27 -05:00
  • 661d71a14b adding diff attn negative component warmup (in progress) Dan Saunders 2025-01-10 21:57:31 +00:00
  • 6dd47edcb8 fire CLI fixes Dan Saunders 2025-01-10 18:24:16 +00:00
  • 7aca08ff60 adding guard statements Dan Saunders 2025-01-10 16:39:21 +00:00
  • 4f804f6d88 adding diff attn callback, adding documentation Dan Saunders 2025-01-10 16:28:27 +00:00
  • 443327c585 CLI build_command bugfix Dan Saunders 2025-01-08 16:34:19 +00:00
  • 70c4e6fbe6 updates and cleanup Dan Saunders 2025-01-06 17:04:05 +00:00
  • 2a7f139ad2 pre-commit fix Dan Saunders 2024-12-28 01:14:08 +00:00
  • 332ce0ae85 fixes and cleanup Dan Saunders 2024-12-28 01:10:56 +00:00
  • e5fa842ff8 update Dan Saunders 2024-12-27 21:29:37 +00:00
  • 78e0ec0aa5 changes Dan Saunders 2024-12-27 21:24:16 +00:00
  • 3bc568eb27 adding registration function Dan Saunders 2024-12-27 11:17:52 -05:00
  • eb6611d55f progress on modeling code Dan Saunders 2024-12-24 05:30:46 +00:00
  • 4ff3328e66 updated custom modeling code Dan Saunders 2024-12-23 20:40:55 -05:00
  • a3fd5074a9 fix duplicate-code warnings Dan Saunders 2024-12-23 14:22:33 -05:00
  • 5b90da0be3 added modeling code; cleanup + refactor Dan Saunders 2024-12-23 14:14:51 -05:00
  • fcbfa86373 refactor and fixing test isolation issues Dan Saunders 2024-12-21 16:56:57 +00:00
  • 0d56582090 adding yaml dumper preserving input config format Dan Saunders 2024-12-20 20:39:40 +00:00