-
4dfa0a59b2
Add uninstall command to cut_cross_entropy import message (#3583) [skip ci]
floaty3
2026-04-10 17:00:07 +00:00
-
8a926a64dc
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-10 03:09:51 +00:00
-
4ef608dda3
fix ddp/fsdp w gemma4 (#3584)
Wing Lian
2026-04-09 20:02:36 -07:00
-
f608d263a6
configurable weight scale normalization for MoE expert drift
weight-scale-norm
Wing Lian
2026-04-09 15:37:16 +00:00
-
7daf7d96f1
fix: regex for unfrozen language tower (#3586) [skip ci]
NanoCode012
2026-04-08 22:18:11 +07:00
-
85f1217ded
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-07 15:17:53 +00:00
-
7c56809c7f
use vllm 0.19.0 for torch 2.10.0 (#3582)
Wing Lian
2026-04-07 08:09:49 -07:00
-
79db7ce04d
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-06 17:17:59 +00:00
-
149178ddb7
chore: cleanup post release v0.16 (#3577)
NanoCode012
2026-04-07 00:10:52 +07:00
-
dc638e723f
fix(config): add cce and liger to nemotron-h example (#3573) [skip ci]
NanoCode012
2026-04-07 00:10:25 +07:00
-
6f15da4cac
make it easier for agents to discover docs (#3579) [skip ci]
Wing Lian
2026-04-06 10:00:55 -07:00
-
-
-
abbda66586
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-04 09:24:11 +00:00
-
900eec7988
Fix DO_NOT_TRACK not being correctly handled (#3580)
Maxime
2026-04-04 05:16:58 -04:00
-
4d19440412
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-02 21:53:22 +00:00
-
08fc7de87e
gemma4 support (#3574)
v0.16.1
fix/issue-7-hf-token-check
fix/issue-6-default-attention
fix/issue-5-8-docs
fix/issue-4-deepspeed-optional
fix/issue-3-telemetry-whitelist
fix/issue-2-flash-attn-install
fix/issue-1-build-deps
Wing Lian
2026-04-02 17:46:46 -04:00
-
f807756bde
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-02 14:25:09 +00:00
-
573726c839
upgrade torchao to 0.17.0 (#3569)
v0.16.0
Wing Lian
2026-04-02 10:18:00 -04:00
-
842fa039dd
feat: add sonicmoe fused lora support (#3519)
NanoCode012
2026-04-02 19:53:48 +07:00
-
5724ca4e57
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-02 12:08:47 +00:00
-
16e32232fb
feat(docs): comprehensive improvement (#3564)
NanoCode012
2026-04-02 19:01:26 +07:00
-
50e9573f24
Update lm-eval for transformers v5 support (#3571) [skip ci]
Andrew Wu
2026-04-02 04:25:18 +01:00
-
55a7950e3d
fix: DPO tool role KeyError (#3217), dataset hash output_dir (#3303), config validators (#3538) [skip ci]
Edward Zion Saji
2026-04-02 05:27:07 +05:30
-
abc1a01cd5
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-01 22:27:59 +00:00
-
c92b71bd0c
MX QAT patch (#3553)
VED
2026-04-02 03:51:02 +05:30
-
bf0338bfed
Built site for gh-pages
Quarto GHA Workflow Runner
2026-04-01 17:36:13 +00:00
-
6c92b5c31c
lazy load trainer classes to prevent unnecesary imports (#3568)
Wing Lian
2026-04-01 13:29:04 -04:00
-
1b1fc917bc
Add precompute_ref_log_probs to config schema (#3555) [skip ci]
Joaquin Hui
2026-04-01 18:28:40 +01:00
-
96ae8bdd1d
Add troubleshooting note for GLM4 GGUF MTP mismatch (#3559) [skip ci]
Mario Župan
2026-04-01 16:05:06 +02:00
-
438ea7b045
chore: update pre-commit hooks (#3567) [skip ci]
github-actions[bot]
2026-04-01 10:04:21 -04:00
-
f6c122b76d
allow bf16 flag but warn (#3563) [skip ci]
kallewoof
2026-04-01 22:54:01 +09:00
-
9e64c76326
qwen3.5 configs (#3554) [skip ci]
VED
2026-04-01 18:49:31 +05:30
-
4e081f9eaf
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-31 23:23:01 +00:00
-
5e5603c9aa
upgrade transformers to 5.4.0 (#3562)
Wing Lian
2026-03-31 19:15:59 -04:00
-
c0ed9e2667
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-31 23:12:15 +00:00
-
a4c94416eb
bug-fix: only apply patches when CUDA is available (#3561)
kallewoof
2026-04-01 08:05:15 +09:00
-
a81feabbd9
DPO transformers v0.29 fixes (#3560) [skip ci]
Andrew Wu
2026-04-01 00:04:53 +01:00
-
109cbb32c5
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-30 22:19:48 +00:00
-
bb622b83de
super nemo support (#3508)
VED
2026-03-31 03:42:50 +05:30
-
412afeee1b
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-28 17:22:32 +00:00
-
00dee05fc6
support flattening/packing for GRPO (#3552)
Wing Lian
2026-03-28 13:15:54 -04:00
-
e80d1ed962
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 22:30:00 +00:00
-
3615373c4b
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 22:24:05 +00:00
-
99bde0124c
deprecate torch 2.8.0 support (#3550)
Wing Lian
2026-03-25 18:22:47 -04:00
-
5191e4eb53
More minor RL fixes (#3551)
Wing Lian
2026-03-25 18:17:49 -04:00
-
857c949129
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 15:27:19 +00:00
-
1d736a2a49
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 15:23:28 +00:00
-
74b959e035
dispatch scored rollouts to plugins, extend path for external plugins, better handle errors with vllm /reset_prefix_cache (#3549)
Wing Lian
2026-03-25 11:19:15 -04:00
-
bfb20674cb
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 14:45:58 +00:00
-
b9fe1393d6
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 12:49:03 +00:00
-
b55706b9f6
feat:merge-lora iterate through bins without loading (#3095)
VED
2026-03-25 18:11:32 +05:30
-
ff0f67c730
feat: add custom routing support for ernie4_5_moe, and hunyuan_v1_moe (#3526)
Avaya Aggarwal
2026-03-25 18:10:31 +05:30
-
678ebb1bb2
Fix Ray train crashing after succeeding (#3542) [skip ci]
Matthew Hambrecht
2026-03-25 07:38:28 -04:00
-
c2bd75aff6
Nemo gym integration (#3516) [skip ci]
Wing Lian
2026-03-25 07:38:06 -04:00
-
4aa7721a0c
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 09:20:10 +00:00
-
a3b711f6f8
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-25 09:13:28 +00:00
-
2fb72798e0
Revert "feat: move to uv first" (#3544)
NanoCode012
2026-03-25 16:12:36 +07:00
-
1f1ebb8237
feat: move to uv first
NanoCode012
2026-03-25 16:06:37 +07:00
-
c50c4acbf4
EBFT: Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models (#3527) [skip ci]
Wing Lian
2026-03-24 18:43:46 -04:00
-
e9883c91d4
fix: robust handling of race condition on patching check (#3543) [skip ci]
Wing Lian
2026-03-24 16:43:43 -04:00
-
da683f72d4
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-24 19:47:01 +00:00
-
e412370877
roundup_power2_divisions not needed with newer pytorch versions (#3540)
Wing Lian
2026-03-24 15:40:05 -04:00
-
936149380f
support nemotron for scattermoe-lora
scattermoe-nanotron
Wing Lian
2026-03-23 21:29:58 +00:00
-
-
-
86be9f329e
post merge lora fixes for CI (#3536) [skip ci]
Wing Lian
2026-03-23 02:26:10 -04:00
-
db6af43f3b
chore: lint
textui
Wing Lian
2026-03-19 00:00:11 -04:00
-
35d06c8087
add textui
Wing Lian
2026-03-17 06:44:28 +00:00
-
-
-
0e583efeaa
increase rtol, codecov informational only, don't silently fail errors w curl (#3534) [skip ci]
Wing Lian
2026-03-22 13:54:03 -04:00
-
b3289fd190
feat: LoRA kernel support for bias, dropout, dora, embeddings (#3528) [skip ci]
Wing Lian
2026-03-22 13:53:19 -04:00
-
6636e5de7e
address PR code review
lhl-moe-aux-loss-free
Wing Lian
2026-03-22 17:23:12 +00:00
-
a67392c427
liger support for qwen 3.5 and fused rmsnorm+gated (#3531) [skip ci]
Wing Lian
2026-03-22 13:19:21 -04:00
-
0a566d7a15
chore: lint
Wing Lian
2026-03-22 12:02:44 -04:00
-
598c965043
use train_loss for sp test
tensorboard-loss-check
Wing Lian
2026-03-22 12:00:55 -04:00
-
5acb1b0ade
update for transformers v5 for experts parameters and compose with moe kernels
Wing Lian
2026-03-22 11:52:34 -04:00
-
4009a2ba5f
reordered our tests to mirror llm_compressor for prepare_plugins/validate order
lhl
2025-11-14 14:18:55 +09:00
-
66b2ab8414
move configs from global config to plugin specific args
lhl
2025-11-13 04:06:27 +09:00
-
676d5e855d
improve: align aux-free telemetry with Trainer logging
lhl
2025-11-11 17:00:48 +00:00
-
966a4555db
fix: make aux-free mixtral adapter GPU-safe
lhl
2025-11-11 17:00:37 +00:00
-
ad0c825bcb
sample packing and telemetry docs
lhl
2025-10-28 10:18:17 +00:00
-
46d677876e
tests: add ring and llama4 aux-free smokes
lhl
2025-10-28 10:01:07 +00:00
-
6eac9ac372
aux_free_router: emit telemetry metrics
lhl
2025-10-28 08:27:48 +00:00
-
949cdf01eb
tests: extend aux-free coverage
lhl
2025-10-28 08:08:13 +00:00
-
a0019021dd
aux_free_router: sync shim state
lhl
2025-10-28 08:08:00 +00:00
-
2af7475fdf
Add ring/llama4 aux-free adapters and EP sync support
lhl
2025-10-27 14:42:36 +09:00
-
3e4688289c
feat(moe-aux-loss-free): aux-free MoE plugin (Mixtral/Qwen3), EMA bias updates, config keys; E2E smoke + parity tests
lhl
2025-10-27 00:14:38 +09:00
-
-
-
a96733930e
retry and more info on download failure
Wing Lian
2026-03-22 11:09:33 -04:00
-
6130e40c37
fix flaky tests; should be using train loss from final step rather than final avg train loss
Wing Lian
2026-03-22 10:38:46 -04:00
-
-
-
3c421e0170
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-22 13:17:42 +00:00
-
5b2e3f00ce
fix: handle connection errors when checking user whoami (#3529)
Wing Lian
2026-03-22 09:11:17 -04:00
-
fc3b3d1d4e
synthetic datasets for benchmarking and testing (#3518) [skip ci]
Wing Lian
2026-03-21 22:47:26 -04:00
-
c9df6efdc2
support offloading layers to CPU (#3512) [skip ci]
Wing Lian
2026-03-21 22:47:02 -04:00
-
0ee98a0309
fix token state json and mistral tokenizer issue (#3522) [skip ci]
Wing Lian
2026-03-21 22:46:10 -04:00
-
2c05847a5f
reduce autotune search space (#3525) [skip ci]
Wing Lian
2026-03-21 18:30:15 -04:00
-
b0294b3427
handle qwen3.5 moe loading (#3523) [skip ci]
Wing Lian
2026-03-20 09:25:16 -04:00
-
1bcfc08c90
feat: add support and end-to-end tests for multiple custom optimizers… (#3457) [skip ci]
Avaya Aggarwal
2026-03-20 17:54:44 +05:30
-
5a5cf30b26
fix: add dequant bf16 repo (#3507) [skip ci]
NanoCode012
2026-03-20 17:11:46 +07:00
-
7ddfb2d8a0
cleanup: remove dead SDPA patches (#3488) [skip ci]
Avaya Aggarwal
2026-03-20 15:40:41 +05:30
-
c57acef2c7
Qwen3.5-MoE example config with lora_target_modules regex (#3515) [skip ci]
Owen Arliawan
2026-03-20 02:52:46 -07:00
-
61e0653994
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-20 09:34:02 +00:00
-
255c5b90ca
fix: make prepare_context_parallel_inputs no-op
fix/cp-waste
NanoCode012
2026-03-20 16:30:58 +07:00
-
-
-
852691c82e
Built site for gh-pages
Quarto GHA Workflow Runner
2026-03-20 09:30:53 +00:00
-
038ffe3f26
fix: solved double sequence partition from SequenceParallelContextManager and Accelerate's native CP (#3498)
Lorenzo Baraldi
2026-03-20 10:27:24 +01:00