Commit Graph

  • 1334281d50 docker fix Dan Saunders 2025-08-30 00:12:09 -04:00
  • 98f230d864 cleanup Dan Saunders 2025-08-30 00:04:39 -04:00
  • 02f308351c fix Dan Saunders 2025-08-29 23:46:55 -04:00
  • 3b91e8174d fix Dan Saunders 2025-08-29 23:06:47 -04:00
  • 40d906fb33 lint Dan Saunders 2025-08-29 22:58:18 -04:00
  • 89d5323c13 fix Dan Saunders 2025-08-29 22:54:21 -04:00
  • df870f6a8f fix Dan Saunders 2025-08-29 22:40:52 -04:00
  • f500aaa490 fix Dan Saunders 2025-08-29 22:38:45 -04:00
  • 9ec33f52e3 wip Dan Saunders 2025-08-29 22:37:09 -04:00
  • b453562c01 fixes Dan Saunders 2025-08-29 22:18:27 -04:00
  • 367f7eb3a6 fix Dan Saunders 2025-08-29 22:10:10 -04:00
  • e888e38ce7 fix Dan Saunders 2025-08-29 22:03:38 -04:00
  • 400120af2d wip Dan Saunders 2025-08-29 21:58:35 -04:00
  • 459e5f9b16 lint Dan Saunders 2025-08-29 21:52:57 -04:00
  • 43f6f84269 wip Dan Saunders 2025-08-29 21:43:57 -04:00
  • 36c4ab11f9 wip Dan Saunders 2025-08-29 21:22:26 -04:00
  • 2f4e4ef604 wip Dan Saunders 2025-08-29 21:14:42 -04:00
  • aee03fc636 wip Dan Saunders 2025-08-29 20:12:57 -04:00
  • 255b818fbc rebase Dan Saunders 2025-08-29 17:08:20 -04:00
  • 332ee74f32 rebase Dan Saunders 2025-08-29 16:57:55 -04:00
  • 3b0d2ac5c0 rebase Dan Saunders 2025-08-29 16:56:26 -04:00
  • 9462a1bf79 wip Dan Saunders 2025-08-29 16:54:27 -04:00
  • cc665dadf9 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-26 14:01:04 +00:00
  • 8e9386c799 go uv first Dan Saunders 2025-08-21 05:05:03 +00:00
  • 740d5a1d31 doc fix (#3187) Dan Saunders 2025-09-26 09:55:15 -04:00
  • f4f53c704c Built site for gh-pages Quarto GHA Workflow Runner 2025-09-26 09:32:53 +00:00
  • 850c1a5f8d Add FSDP v2 swap memory support + QLoRA compatibility fixes (#3167) Grant Holmes (Ren) 2025-09-26 04:23:59 -05:00
  • 27ba931e09 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-26 05:17:25 +00:00
  • 7fa8ac40cd Feat(cce): add qwen3_vl, qwen3_vl_moe, granitemoeshared, granitemoehybrid, and upgraded all cce patches (#3178) NanoCode012 2025-09-26 12:11:29 +07:00
  • dd85358543 default mg vendor-moe Dan Saunders 2025-09-25 16:30:23 -04:00
  • 55d98db0d0 fix Dan Saunders 2025-09-25 16:08:35 -04:00
  • d90ade3b1b fix Dan Saunders 2025-09-25 15:55:08 -04:00
  • 824a641cee uniform routing default Dan Saunders 2025-09-25 15:47:23 -04:00
  • e003a05177 narrow sweep; compare both backends Dan Saunders 2025-09-25 14:54:03 -04:00
  • 91393c4dc8 allocator Dan Saunders 2025-09-23 18:20:57 -04:00
  • d578c53603 fix Dan Saunders 2025-09-23 18:13:53 -04:00
  • 4db7a21ff7 fix Dan Saunders 2025-09-23 18:03:41 -04:00
  • 3b2e05c563 update to new api Dan Saunders 2025-09-23 17:49:04 -04:00
  • 1037ca3a97 update to new api Dan Saunders 2025-09-23 16:44:26 -04:00
  • 6369dcd7b8 fix Dan Saunders 2025-09-23 20:21:22 +00:00
  • a81612305c fix? Dan Saunders 2025-09-23 16:21:00 -04:00
  • d0da67eb17 add mg kernel backend Dan Saunders 2025-09-23 15:43:16 -04:00
  • 8a1f5ae940 fix Dan Saunders 2025-09-23 17:51:14 +00:00
  • 146ca48cba vram Dan Saunders 2025-09-23 13:50:48 -04:00
  • fd312f6058 dtype Dan Saunders 2025-09-23 12:20:39 -04:00
  • ab8fa56b16 dtype Dan Saunders 2025-09-23 12:14:55 -04:00
  • 1640cd4006 delete config Dan Saunders 2025-09-23 15:34:07 +00:00
  • 3277d44d71 cfg value Dan Saunders 2025-09-23 11:29:41 -04:00
  • d3e1b0ef1a small deepseek script Dan Saunders 2025-09-22 23:13:45 -04:00
  • 5b97633faa Fix Dan Saunders 2025-09-22 22:48:11 -04:00
  • 94cbc6d42d log device, dtype Dan Saunders 2025-09-22 22:15:44 -04:00
  • 493616fc3d reprod tt table Dan Saunders 2025-09-22 17:00:58 -04:00
  • d2b25c7327 grid sweep Dan Saunders 2025-09-22 16:34:55 -04:00
  • b670c45276 fix Dan Saunders 2025-09-22 16:27:22 -04:00
  • 61faf4cbe4 fix Dan Saunders 2025-09-22 16:24:32 -04:00
  • 8d8fa834a2 sweep Dan Saunders 2025-09-22 16:21:50 -04:00
  • 9d69c6fb3e Fix Dan Saunders 2025-09-22 16:10:41 -04:00
  • 92f2f6e73c dtype fix Dan Saunders 2025-09-22 16:07:45 -04:00
  • e5d2aebe16 uniform routing: Dan Saunders 2025-09-22 16:03:38 -04:00
  • 4ab9e3f58b add logs Dan Saunders 2025-09-22 15:58:28 -04:00
  • 5788832812 simplify Dan Saunders 2025-09-22 19:53:36 +00:00
  • db782430f8 fix Dan Saunders 2025-09-22 15:54:44 -04:00
  • 5c74edeefe token shuffle kernel Dan Saunders 2025-09-21 16:46:46 -04:00
  • 18269ee6a9 fix Dan Saunders 2025-09-21 16:37:10 -04:00
  • 6a45d804f9 glue Dan Saunders 2025-09-21 16:23:23 -04:00
  • 95e607574a vendor torchtitan moe kernels Dan Saunders 2025-09-21 12:52:25 -04:00
  • 3299f182ba ungate lora with bias lora-fsdp2-doc Dan Saunders 2025-09-25 12:40:13 -04:00
  • 2fc430d365 update lora optims doc Dan Saunders 2025-09-25 12:24:25 -04:00
  • dd469bf2eb Built site for gh-pages Quarto GHA Workflow Runner 2025-09-25 16:09:35 +00:00
  • f9748c4dc5 Cp fix (#3182) Dan Saunders 2025-09-25 12:03:50 -04:00
  • 09725be990 add support for CP + torch SDPA cp-sdpa Dan Saunders 2025-09-25 12:03:43 -04:00
  • 67d6a3dcff Built site for gh-pages Quarto GHA Workflow Runner 2025-09-25 10:11:48 +00:00
  • 33975ce4bc feat(qwen3-next): Adds targeting of shared expert and attention modules (#3183) miketung 2025-09-25 19:06:16 +09:00
  • 3c0d96db45 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-25 05:12:06 +00:00
  • e8b962d47f feat: support training with JSON string tool arguments (#3136) 陈华杰 2025-09-25 13:06:21 +08:00
  • 939023e661 chunked DPO loss 3181 Dan Saunders 2025-09-24 17:43:06 -04:00
  • 856ff12171 feat(doc): add optimizations table of content to our improvements (#3175) [skip ci] NanoCode012 2025-09-25 03:13:49 +07:00
  • f9bd6936c1 Merge branch 'main' into cp-fix Dan Saunders 2025-09-24 14:01:23 -04:00
  • b9a3bfee5a only patch in CP > 1 case Dan Saunders 2025-09-24 13:36:14 -04:00
  • 08124a7c92 nits Dan Saunders 2025-09-24 13:25:46 -04:00
  • dc4adee7b0 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-24 17:24:29 +00:00
  • 6bc959342b remove unused dep (#3180) Dan Saunders 2025-09-24 13:18:44 -04:00
  • 56e0a77e0d patch transformers to allow CP + FA2 Dan Saunders 2025-09-24 13:08:38 -04:00
  • 8de3ae1e1d Built site for gh-pages Quarto GHA Workflow Runner 2025-09-24 06:54:18 +00:00
  • b3b92687c4 chore: rename gemma3 270m config (#3174) NanoCode012 2025-09-24 13:48:38 +07:00
  • a2ba36ec63 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-23 14:28:06 +00:00
  • 55d1be2ae6 fix: unify default for conversations_field [skip-e2e] (#3070) NanoCode012 2025-09-23 21:22:15 +07:00
  • f7c612a032 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-23 04:37:00 +00:00
  • 08d831c3d5 Feat: add qwen3-next (w packing+cce) (#3150) NanoCode012 2025-09-23 11:31:15 +07:00
  • 8564961423 fix compile moekernels Dan Saunders 2025-09-19 13:59:57 -04:00
  • ce21da9177 fix compile Dan Saunders 2025-09-19 13:55:54 -04:00
  • b5dc58373f fix compile Dan Saunders 2025-09-19 13:52:42 -04:00
  • 7327144344 compile Dan Saunders 2025-09-19 13:41:12 -04:00
  • fb11f696e9 bench sweep Dan Saunders 2025-09-19 13:24:40 -04:00
  • 105c817b0b default fix Dan Saunders 2025-09-19 16:59:20 +00:00
  • 64345e7707 recurse fix Dan Saunders 2025-09-19 12:58:58 -04:00
  • 0f8b921399 contig Dan Saunders 2025-09-19 12:47:53 -04:00
  • 336616d659 defaults Dan Saunders 2025-09-19 16:34:45 +00:00
  • d2f1e23bcd fix Dan Saunders 2025-09-19 12:45:18 -04:00
  • 42aadc5069 bench fix Dan Saunders 2025-09-19 12:34:08 -04:00