NanoCode012
945c8aeb10
Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes (#3439)
* fix: saving clones state dict
* fix: apply fix for only CP mode
* fix: add dropout check when using lora target param
* fix: re-add patch from transformers PR #39866
* feat: add moe quant to test by ved
* fix: try match target param properly end with
* fix: clear cache per param quant
* fix: attempt on-load quantize experts instead of post-load
* fix: attempt disable async load
* chore: add log
* chore: adjust log
* fix: remove cuda alloc for moe and enable async load
* chore: remove leftover logs
* chore: add extra empty cache
* fix(doc): clarify support
* fix: handle fsdp2 for paramwrapper dtensor
* feat: attempt to quant experts in 8bit mode too
* feat: attempt to release bf16 experts from vram
* feat: upgrade cce
* fix: fsdp2 init_sharded_param load int8/uint4 dtensor as
require_grad=true on init
* fix: remove unnecessary gc and empty cache
* Revert "fix: remove unnecessary gc and empty cache"
This reverts commit 1d54518990.
* fix: do not call full_tensor on non-dtensors
* fix: attempt to address fsdp2 with quant exp high loss
* fix: attempt lora quant experts wrong dim
* fix: ensure require_grad patch applied for lora 8bit
* fix: attempt lora 8bit fsdp2
* fix: attribute access on save for lora 8bit fsdp2
* fix: wrong weight attrib access
* chore(refactor): add config, re-arrange position of patches, clean
comments
* feat: add example docs
* chore: cherry pick trinity fixes from PR 3399
* chore: comments refactor; add guards
* fix: guard using wrong key
* fix: mamba save does not accept main process param
* fix: guard prevent double hook
* fix: move gc to upper scope
* chore: add comment on proxy forward patch
* fix: add comment to clarify
* feat: add test idempotency
* fix: AttributeError: `e_score_correction_bias` is not an nn.Parameter
* fix: AttributeError: 'NoneType' object has no attribute 'to'
* fix: update docs on cpu_ram_efficient_loading
2026-03-03 10:06:23 -05:00
..
2025-09-24 16:13:49 -04:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2025-09-23 21:22:15 +07:00
2025-08-26 09:29:50 -04:00
2025-07-21 11:40:56 -04:00
2026-03-03 10:06:23 -05:00
2025-09-23 21:22:15 +07:00
2025-07-30 06:44:06 -04:00
2026-01-21 20:00:18 -05:00
2025-08-08 12:45:36 +01:00
2026-01-28 06:44:15 -05:00
2025-07-21 11:40:56 -04:00
2025-07-21 11:40:56 -04:00
2026-01-13 09:49:23 -05:00
2026-01-21 20:00:18 -05:00
2025-07-30 06:44:06 -04:00
2026-03-03 10:06:23 -05:00
2026-02-10 17:43:53 +07:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2026-01-21 20:00:18 -05:00
2025-12-25 18:07:59 +07:00
2026-01-27 17:08:24 -05:00
2025-12-25 17:53:52 +07:00
2025-10-09 10:47:41 -04:00
2025-09-26 10:23:59 +01:00
2026-01-27 17:08:24 -05:00
2025-07-22 10:00:30 -04:00
2025-09-23 21:22:15 +07:00
2025-07-22 10:00:30 -04:00
2026-01-21 20:00:18 -05:00
2026-01-27 17:08:24 -05:00
2025-12-25 18:09:03 +07:00
2025-12-04 21:44:44 +07:00
2026-01-13 14:33:11 +07:00
2025-12-25 19:17:25 +07:00
2025-12-25 19:17:25 +07:00
2025-12-25 17:56:20 +07:00
2025-07-30 06:44:06 -04:00
2025-09-23 21:22:15 +07:00
2025-09-18 15:42:20 +07:00
2025-12-25 18:09:03 +07:00
2025-12-17 09:35:22 -05:00
2025-12-19 10:43:47 -05:00
2025-09-23 21:22:15 +07:00
2025-09-23 21:22:15 +07:00
2025-12-09 14:31:03 +07:00
2026-03-03 09:26:46 -05:00
2025-11-24 10:21:31 +07:00
2025-08-08 08:02:03 -04:00
2025-11-24 10:21:31 +07:00
2025-09-02 12:08:44 -04:00
2026-01-06 09:19:18 -05:00
2026-03-03 10:06:23 -05:00
2026-01-21 20:00:18 -05:00