Commit Graph

  • 1e7302d30a bench fix Dan Saunders 2025-09-19 12:20:35 -04:00
  • 63544ce709 fix Dan Saunders 2025-09-19 11:34:27 -04:00
  • 3bfed0aac8 shared expert detection Dan Saunders 2025-09-19 11:24:14 -04:00
  • c9c127fa13 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-19 10:40:15 +00:00
  • 7be8740c5c fix(rl): pass max_prompt_len to training args as max_prompt_length (#3113) AlexHT Hung 2025-09-19 18:34:28 +08:00
  • c51d6b06c3 feat: add apertus model and cce (#3144) [skip ci] NanoCode012 2025-09-19 17:34:04 +07:00
  • bfc848f81d bits and pieces Dan Saunders 2025-09-19 02:12:57 +00:00
  • abe1cad6bc another bench Dan Saunders 2025-09-18 13:45:19 -04:00
  • 354389caef torchtitan bench Dan Saunders 2025-09-18 13:29:20 -04:00
  • efcd032fce yet another refactor Dan Saunders 2025-09-18 13:03:28 -04:00
  • 7500641601 yet another refactor Dan Saunders 2025-09-18 12:47:15 -04:00
  • 0295df5bca precompute fuse Dan Saunders 2025-09-18 12:10:46 -04:00
  • b39ef54833 combine mult Dan Saunders 2025-09-18 12:08:03 -04:00
  • ad4cd39bcd remove contig Dan Saunders 2025-09-18 11:55:15 -04:00
  • 5c197275ad inplace Dan Saunders 2025-09-18 11:51:17 -04:00
  • 19c91e3675 refactor Dan Saunders 2025-09-18 11:44:21 -04:00
  • 2a176e4923 fix Dan Saunders 2025-09-18 11:29:33 -04:00
  • 7d867de9b2 refactor Dan Saunders 2025-09-18 11:23:15 -04:00
  • 01b6792c2e refactor Dan Saunders 2025-09-18 11:20:08 -04:00
  • 2cc5a4147d Built site for gh-pages Quarto GHA Workflow Runner 2025-09-18 08:47:56 +00:00
  • 09959fac70 Feat: add Magistral Small 2509 and native mistral3 tokenizer support (#3165) NanoCode012 2025-09-18 15:42:20 +07:00
  • bbf1f14ca4 dtype issues Dan Saunders 2025-09-17 23:52:18 +00:00
  • c6878beb7d simplify Dan Saunders 2025-09-17 19:15:34 -04:00
  • e62979d11d fix Dan Saunders 2025-09-17 18:53:07 -04:00
  • d57b9c67c2 log Dan Saunders 2025-09-17 18:52:27 -04:00
  • eaaf16aa00 cumulative offsets Dan Saunders 2025-09-17 18:45:15 -04:00
  • f3b953e222 fix? Dan Saunders 2025-09-17 18:42:10 -04:00
  • 7935dc0911 dtype fix Dan Saunders 2025-09-17 18:36:22 -04:00
  • d2b49b2670 error msg Dan Saunders 2025-09-17 18:29:30 -04:00
  • b5cb345ca4 fix test Dan Saunders 2025-09-17 18:24:00 -04:00
  • 03d4c2683e fix perf degradation Dan Saunders 2025-09-17 18:20:37 -04:00
  • fd87eed501 minify Dan Saunders 2025-09-17 16:42:35 -04:00
  • 129db67705 fix Dan Saunders 2025-09-17 16:24:29 -04:00
  • 38b890a36b fix Dan Saunders 2025-09-17 16:16:41 -04:00
  • 180920c7bf simplify Dan Saunders 2025-09-17 19:49:18 +00:00
  • d024048d74 logs + fix Dan Saunders 2025-09-17 14:50:49 -04:00
  • 98dc945838 fix Dan Saunders 2025-09-17 14:42:53 -04:00
  • 108600cd69 update config Dan Saunders 2025-09-17 14:36:24 -04:00
  • 0e9387c395 fix Dan Saunders 2025-09-17 14:35:36 -04:00
  • db61e0d4ff fix Dan Saunders 2025-09-17 14:26:25 -04:00
  • 51e565f60a logs Dan Saunders 2025-09-17 14:15:51 -04:00
  • c774dd0409 refactor + fix Dan Saunders 2025-09-17 14:01:39 -04:00
  • 7289e0cb55 more logs Dan Saunders 2025-09-16 00:26:39 -04:00
  • 8d483c11f7 more logs Dan Saunders 2025-09-16 00:22:00 -04:00
  • 9c1829cf57 more logs Dan Saunders 2025-09-16 00:15:08 -04:00
  • 135b09d1de logs, qwen2 support Dan Saunders 2025-09-16 00:02:24 -04:00
  • de4344a56e patch Dan Saunders 2025-09-15 23:15:20 -04:00
  • 7d572b58d1 just grouped_mm for now Dan Saunders 2025-09-15 23:03:18 -04:00
  • 773d7e4291 update Dan Saunders 2025-09-15 20:03:12 -04:00
  • fef47a5b7c hardening Dan Saunders 2025-09-15 19:41:10 -04:00
  • f6ed8ddc01 fix Dan Saunders 2025-09-15 19:39:19 -04:00
  • 556d6448fe fix Dan Saunders 2025-09-15 19:36:00 -04:00
  • 5c2229721d diag Dan Saunders 2025-09-15 19:34:08 -04:00
  • d7de6b0e96 grouped_mm Dan Saunders 2025-09-15 19:31:21 -04:00
  • 3c6648678f numerics Dan Saunders 2025-09-15 19:08:30 -04:00
  • 5b19a1ea9c improve Dan Saunders 2025-09-15 19:04:58 -04:00
  • cfefad1eea fix Dan Saunders 2025-09-15 19:00:58 -04:00
  • 125e7b5fe6 fast path Dan Saunders 2025-09-15 18:57:13 -04:00
  • 479b6144df tflops Dan Saunders 2025-09-15 18:52:55 -04:00
  • 68da65cba2 update Dan Saunders 2025-09-15 18:48:43 -04:00
  • 0d689bb421 cache, example Dan Saunders 2025-09-15 15:22:11 -04:00
  • 43ada1278a moe kernels init scaffold Dan Saunders 2025-09-15 12:20:41 -04:00
  • ba0dd2cdec Built site for gh-pages Quarto GHA Workflow Runner 2025-09-17 17:32:44 +00:00
  • 4065bc14c6 Debug log, logging improvements (#3159) Dan Saunders 2025-09-17 13:27:03 -04:00
  • b2034c645e Built site for gh-pages Quarto GHA Workflow Runner 2025-09-17 09:44:04 +00:00
  • e5c427f6de qat doc updates (#3162) [skip-ci] salman 2025-09-17 10:38:15 +01:00
  • 421eea620c Built site for gh-pages Quarto GHA Workflow Runner 2025-09-16 18:58:53 +00:00
  • 86d6ee7c05 upgrade trl and accelerate (#3161) Wing Lian 2025-09-16 14:53:01 -04:00
  • d4cff1b7bb improve setting of NCCL_P2P_DISABLE on runpod (#3132) [skip ci] Wing Lian 2025-09-16 14:52:45 -04:00
  • 1ef6c196f7 setup env vars for ray train for FSDP (#3130) [skip ci] Wing Lian 2025-09-16 14:52:29 -04:00
  • e1c7a61243 fix reentrant when using offloading reentrant-w-offloading Wing Lian 2025-09-14 10:42:15 -04:00
  • a7676af44d hmmm lora_bf16 Salman Mohammadi 2025-09-12 18:51:10 +01:00
  • 52e37077fc Merge branch 'main' into lora_bf16 Salman Mohammadi 2025-09-12 18:35:03 +01:00
  • 850489405b working? Salman Mohammadi 2025-09-12 17:34:41 +00:00
  • 6874d32e0c more lora handling Salman Mohammadi 2025-09-12 15:26:12 +00:00
  • db626de56e Built site for gh-pages Quarto GHA Workflow Runner 2025-09-12 10:02:06 +00:00
  • 58d67bf98d Migrate QAT API; fix axolotl quantize for QAT-ed models; add NVFP4 (#3107) salman 2025-09-12 10:55:50 +01:00
  • 0401a15888 SEO go brrr (#3153) [skip-ci] salman 2025-09-12 10:55:11 +01:00
  • fcfc13d710 feat(doc): update thinking and chat_template notes (#3114) [skip ci] NanoCode012 2025-09-12 14:45:18 +07:00
  • 782d946b5a Built site for gh-pages Quarto GHA Workflow Runner 2025-09-11 10:25:08 +00:00
  • 9406c0c488 log before eval step (#3148) [skip-ci] salman 2025-09-11 11:19:30 +01:00
  • 1541d0d193 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-11 00:33:02 +00:00
  • 1b53c49e1a text diffusion training plugin (#3067) Dan Saunders 2025-09-10 20:27:00 -04:00
  • ab0619cc51 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-10 02:10:05 +00:00
  • b71482cec5 Feat: add hunyuan v1 (#3016) NanoCode012 2025-09-10 09:03:30 +07:00
  • 79103b01ca Feat: add seedoss (#3104) [skip ci] NanoCode012 2025-09-10 09:01:02 +07:00
  • 6daed7d060 dont keep adpater weights in fp32 Salman Mohammadi 2025-09-09 17:11:13 +01:00
  • 2a2df5045c Built site for gh-pages Quarto GHA Workflow Runner 2025-09-09 14:56:04 +00:00
  • 9640338d37 Default include_tkps to true (#3134) salman 2025-09-09 15:50:21 +01:00
  • b5d4c7ff54 allow 1% deviation for codecov (#3138) [skip ci] Wing Lian 2025-09-07 11:01:03 -04:00
  • a47f9638a9 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-07 14:55:00 +00:00
  • 8fd9221f13 Add ipo as an rl type that shares DPODataset config (#3128) Seungduk Kim 2025-09-07 23:49:10 +09:00
  • bf00f29f3a chore: update pre-commit hooks (#3137) [skip ci] github-actions[bot] 2025-09-07 10:33:20 -04:00
  • 3104aded43 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-05 15:06:26 +00:00
  • 1d32278755 feat: upgrade transformers to v4.56.1 (#3127) NanoCode012 2025-09-05 22:00:54 +07:00
  • 08e300f1cf Built site for gh-pages Quarto GHA Workflow Runner 2025-09-03 20:30:59 +00:00
  • 3d8507f9a5 Built site for gh-pages Quarto GHA Workflow Runner 2025-09-03 20:28:20 +00:00
  • c6ae5c43cb fix: chat template jinja file not being loaded during inference (#3112) NanoCode012 2025-09-04 03:25:09 +07:00
  • efa1da52d5 Center rewards coefficient (#3124) yardenhoch 2025-09-03 23:22:37 +03:00
  • 48db520d92 Create 270m-qlora.yml (#3075) [skip ci] mhenrichsen 2025-09-03 22:20:32 +02:00