Commit Graph

  • 038ffe3f26 fix: solved double sequence partition from SequenceParallelContextManager and Accelerate's native CP (#3498) Lorenzo Baraldi 2026-03-20 10:27:24 +01:00
  • c13cb7c853 feat: add nemotron config (#3506) VED 2026-03-20 14:53:42 +05:30
  • b3823cc6b0 fix: gemma3 configs (#3500) [skip ci] VED 2026-03-20 14:44:06 +05:30
  • 113d275bd9 qwen docs + new config (#3499) [skip ci] VED 2026-03-20 14:43:34 +05:30
  • 7920fe74ec fix num_labels= 1 test fail (#3493) [skip ci] VED 2026-03-20 14:42:23 +05:30
  • 4ef8bc6368 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-20 03:15:14 +00:00
  • 1fc86d5295 Scattermoe LoRA optimizations (#3513) Wing Lian 2026-03-19 23:07:42 -04:00
  • 5f78d63020 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-19 12:36:14 +00:00
  • bb483ad4c4 make the CI fail GitHub Actions on test failures (#3517) Wing Lian 2026-03-19 08:29:24 -04:00
  • 42922f8f8b register pressure estimation and pruning for h200/b200 scattermoe-lora-optim-dtypestest Wing Lian 2026-03-19 06:39:16 -04:00
  • 7041592ca7 fix casting for H200 and B200 Wing Lian 2026-03-19 05:57:54 -04:00
  • fec0c3a99e chore: lint Wing Lian 2026-03-19 07:27:23 +00:00
  • 31d8d068bb handle base+lora split kernel for older moe models Wing Lian 2026-03-19 07:11:30 +00:00
  • 66fea258c7 add correctness unit tests and benchmarks for scattermoe + lora Wing Lian 2026-03-19 06:40:01 +00:00
  • 07ff389be8 selective dequant Wing Lian 2026-03-19 03:24:30 +00:00
  • 2dcca15f65 more scattermoe optims Wing Lian 2026-03-18 23:36:28 +00:00
  • c5db90aa3f optimize moe + lora Wing Lian 2026-03-18 21:42:13 +00:00
  • 4673612cd5 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-19 06:10:08 +00:00
  • 163bd4dd5a use custom triton kernels for entropy from logits and selective softmax (#3510) Wing Lian 2026-03-19 02:02:43 -04:00
  • f291ac029c fix for flaky tests in lora ops kernels w autotune (#3511) [skip ci] Wing Lian 2026-03-19 01:18:47 -04:00
  • 8d38a13bb4 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-17 15:50:29 +00:00
  • 5ef3f28340 Support for Async GRPO (#3486) Wing Lian 2026-03-17 11:42:47 -04:00
  • 6eeb2c8370 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-17 13:00:28 +00:00
  • 999b3fec2e fix: replace shell=True subprocess with argument list in modal CLI (#3487) Aarush 2026-03-17 18:23:13 +05:30
  • 8e92c65700 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-17 03:55:08 +00:00
  • 8f3fb517b3 consolidate behavioud of routing in scattermoe kernels (#3475) Wing Lian 2026-03-16 23:47:40 -04:00
  • 830e9f7eaf automatically enable tf32 if supported (#3473) [skip ci] Wing Lian 2026-03-16 23:47:00 -04:00
  • 138e8ed7f5 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-17 02:47:04 +00:00
  • d230cbbde3 chore(doc): update readme (#3503) [skip ci] NanoCode012 2026-03-17 09:43:24 +07:00
  • a098df527b feat: add Mistral Small 4 (#3502) NanoCode012 2026-03-17 09:39:05 +07:00
  • de3e742dbb Built site for gh-pages Quarto GHA Workflow Runner 2026-03-16 04:21:26 +00:00
  • 7da5f94379 feat: add FA4 (#3481) NanoCode012 2026-03-16 11:13:18 +07:00
  • 4a5876df7a fix: explicit set workflow permission and move secrets to necessary (#3484) [skip ci] NanoCode012 2026-03-16 11:13:05 +07:00
  • defee62d99 fix: fix CONTRIBUTING.md placeholders, bare except clauses, and add convert.py tests (#3485) [skip ci] Aarush 2026-03-16 09:42:40 +05:30
  • a049510950 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-16 02:17:54 +00:00
  • f56efdb4ab fix: high eval loss w/ sample packing (#3478) [skip ci] VED 2026-03-16 07:41:23 +05:30
  • d8a646c80d chore: logging cleanup (#3482) [skip ci] NanoCode012 2026-03-16 09:10:57 +07:00
  • a806704e94 moe quant patch for merge miss match (#3483) VED 2026-03-16 07:40:30 +05:30
  • dce5bed379 feat: merge adapter in fp32 fix/merge-lora-fp32 NanoCode012 2026-03-14 00:20:59 +07:00
  • 570b9d5215 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-13 16:01:27 +00:00
  • d8a05744d7 Reverts commits 79908b3c6, 083c5a042, e1ff75624, ff77fa248. (#3496) Wing Lian 2026-03-13 11:54:09 -04:00
  • 3520d5a15b Built site for gh-pages Quarto GHA Workflow Runner 2026-03-13 14:26:49 +00:00
  • ff77fa2488 preserve env for root -> ubuntu user (#3495) Wing Lian 2026-03-13 10:19:34 -04:00
  • 073a93147f Built site for gh-pages Quarto GHA Workflow Runner 2026-03-13 13:13:38 +00:00
  • e1ff756245 become the ubuntu user when root logs in (#3494) Wing Lian 2026-03-13 09:06:54 -04:00
  • 5696ba4e4f Built site for gh-pages Quarto GHA Workflow Runner 2026-03-13 03:29:09 +00:00
  • 083c5a0421 check ubuntu user and set uv python dir (#3492) Wing Lian 2026-03-12 23:20:54 -04:00
  • ff26a28e54 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-13 00:49:01 +00:00
  • 79908b3c6e use ubuntu user instead of root for uv docker images (#3491) Wing Lian 2026-03-12 20:41:13 -04:00
  • f463712a9a Built site for gh-pages Quarto GHA Workflow Runner 2026-03-12 01:52:42 +00:00
  • 819b157c7b swap around what we're building for docker (#3490) Wing Lian 2026-03-11 21:45:13 -04:00
  • e7733db9f9 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-12 00:16:19 +00:00
  • fccc712dae builds for py312-cu128-torch2.9.1 (#3489) Wing Lian 2026-03-11 20:09:03 -04:00
  • 3aef8a6174 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-11 06:25:57 +00:00
  • 23ad40bdd5 fix: disable async load when loading quantized bnb NanoCode012 2026-03-11 13:18:27 +07:00
  • c887057e5e make gpus go brrr async-grpo-patched-v2 Wing Lian 2026-03-10 03:29:10 +00:00
  • bba1330e9b fix replay buffer Wing Lian 2026-03-10 02:18:43 +00:00
  • 9394d17f28 fix liger kernel setup Wing Lian 2026-03-09 21:22:35 -04:00
  • e380f6944d handle call to create data producer Wing Lian 2026-03-09 19:59:09 -04:00
  • d69d52ba41 use fast async Wing Lian 2026-03-09 19:47:28 -04:00
  • 575425a36f implement data producer Wing Lian 2026-03-09 23:28:42 +00:00
  • f0c9e98699 async grpo support Wing Lian 2026-03-09 22:59:16 +00:00
  • cf4d550c88 fix: reduce permissions for preview docs CI (#3480) [skip ci] NanoCode012 2026-03-09 19:04:31 +07:00
  • da26270f1a Built site for gh-pages Quarto GHA Workflow Runner 2026-03-07 12:16:54 +00:00
  • 43b1c80aa6 load weights synchronously so they can be converted and not OOM: (#3477) Wing Lian 2026-03-07 07:09:24 -05:00
  • a36aaa70ce add gpu tests for scattermoe (#3474) [skip ci] Wing Lian 2026-03-07 00:00:48 -05:00
  • b975259c10 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 20:06:08 +00:00
  • 80f7088ad1 update setuptools so trl can be installed from main for nightlies (#3471) Wing Lian 2026-03-06 14:59:25 -05:00
  • 46b9f40f2a bump dev version to 0.16.0.dev0 (#3472) [skip ci] Wing Lian 2026-03-06 14:59:00 -05:00
  • d7a9a5d14e Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 18:01:56 +00:00
  • 8f19169eb0 tag for v0.15.0 release (#3470) v0.15.0 Wing Lian 2026-03-06 12:55:11 -05:00
  • c9031b3bd2 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 17:47:59 +00:00
  • 876941ffd0 install flash-linear-attention (#3466) Wing Lian 2026-03-06 12:40:57 -05:00
  • d65e1b960c fix: add guard for _initialize_missing_keys patch (#3469) [skip ci] NanoCode012 2026-03-06 23:45:03 +07:00
  • 0a23ae08f7 fix: position_ids casted to int64 for qwen35 patch (#3468) [skip ci] NanoCode012 2026-03-06 23:44:00 +07:00
  • fc2d63ee5f use new tf32 APIs for torch 2.9+ (#3467) [skip ci] Wing Lian 2026-03-06 11:40:32 -05:00
  • 2b9012773d Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 14:37:11 +00:00
  • c119382337 add: qwen 3.5 (#3442) VED 2026-03-06 20:01:00 +05:30
  • 17a84c24d3 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 14:26:20 +00:00
  • 6c8c73e5a4 fix(validation): add validation for lora target linear with quantize experts (#3461) NanoCode012 2026-03-06 21:19:05 +07:00
  • a260d330ed add info about linting that was removed at some point (#3458) [skip ci] Wing Lian 2026-03-06 09:18:38 -05:00
  • 8f63599e42 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-06 14:18:35 +00:00
  • da17c7c0d9 fix: use dp_world_size instead of world_size for batch_size with tensor parallelism (#3462) [skip ci] Gilles Turpin 2026-03-06 15:18:13 +01:00
  • cada93cee5 upgrade transformers==5.3.0 trl==0.29.0 kernels (#3459) Wing Lian 2026-03-06 09:11:20 -05:00
  • 56162f71db monkeypatch fix for fsdp with cpu ram efficient loading (#3464) [skip ci] Wing Lian 2026-03-06 09:10:58 -05:00
  • 6c44afaea1 chore: update pre-commit hooks (#3381) [skip ci] github-actions[bot] 2026-03-05 21:39:34 -05:00
  • 234931d512 extend pytest-sdist timeout to 30 min for slow/flaky tests (#3456) [skip ci] Wing Lian 2026-03-05 15:04:38 -05:00
  • edcde0fe5d Built site for gh-pages Quarto GHA Workflow Runner 2026-03-05 18:52:32 +00:00
  • 6a8baf8fa7 feat: add sonicmoe (#3411) NanoCode012 2026-03-06 01:43:31 +07:00
  • 1eaf4d7418 add: support mxfp4 axo (#3375) VED 2026-03-06 00:10:45 +05:30
  • 91d7139afc Built site for gh-pages Quarto GHA Workflow Runner 2026-03-05 18:04:42 +00:00
  • 5d6fff8520 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-05 17:49:32 +00:00
  • 4b8bc52424 fix: correct total_num_steps and batch_size calculation with context parallelism (#3444) Gilles Turpin 2026-03-05 18:33:28 +01:00
  • 28cc085283 include number of params and rounded est of params so we can easily group in posthog (#3455) Wing Lian 2026-03-05 12:31:17 -05:00
  • 2047d72087 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-05 15:06:49 +00:00
  • 8e2a102cca Fix FSDP2 sharding and validate AO version for LR groups (#3403) bekk02 2026-03-05 06:59:32 -08:00
  • 753906cfc7 feat: add doc for expert quantization, glm45 air example configs, and update readme for release (#3452) [skip ci] NanoCode012 2026-03-05 21:58:09 +07:00
  • 1d5116a77e Built site for gh-pages Quarto GHA Workflow Runner 2026-03-04 15:00:25 +00:00
  • a4a3b618e7 force torch to match when installing fa and deepspeed using uv uv-fixup Wing Lian 2026-03-04 10:00:08 -05:00
  • b6b8db805a fix python version typo for building 3.11 (#3454) Wing Lian 2026-03-04 09:53:35 -05:00