NanoCode012
945c8aeb10
Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes (#3439)
* fix: saving clones state dict
* fix: apply fix for only CP mode
* fix: add dropout check when using lora target param
* fix: re-add patch from transformers PR #39866
* feat: add moe quant to test by ved
* fix: try match target param properly end with
* fix: clear cache per param quant
* fix: attempt on-load quantize experts instead of post-load
* fix: attempt disable async load
* chore: add log
* chore: adjust log
* fix: remove cuda alloc for moe and enable async load
* chore: remove leftover logs
* chore: add extra empty cache
* fix(doc): clarify support
* fix: handle fsdp2 for paramwrapper dtensor
* feat: attempt to quant experts in 8bit mode too
* feat: attempt to release bf16 experts from vram
* feat: upgrade cce
* fix: fsdp2 init_sharded_param load int8/uint4 dtensor as
require_grad=true on init
* fix: remove unnecessary gc and empty cache
* Revert "fix: remove unnecessary gc and empty cache"
This reverts commit 1d54518990.
* fix: do not call full_tensor on non-dtensors
* fix: attempt to address fsdp2 with quant exp high loss
* fix: attempt lora quant experts wrong dim
* fix: ensure require_grad patch applied for lora 8bit
* fix: attempt lora 8bit fsdp2
* fix: attribute access on save for lora 8bit fsdp2
* fix: wrong weight attrib access
* chore(refactor): add config, re-arrange position of patches, clean
comments
* feat: add example docs
* chore: cherry pick trinity fixes from PR 3399
* chore: comments refactor; add guards
* fix: guard using wrong key
* fix: mamba save does not accept main process param
* fix: guard prevent double hook
* fix: move gc to upper scope
* chore: add comment on proxy forward patch
* fix: add comment to clarify
* feat: add test idempotency
* fix: AttributeError: `e_score_correction_bias` is not an nn.Parameter
* fix: AttributeError: 'NoneType' object has no attribute 'to'
* fix: update docs on cpu_ram_efficient_loading
2026-03-03 10:06:23 -05:00
..
2026-02-23 10:10:06 -05:00
2026-02-19 18:27:27 -05:00
2026-03-02 12:26:30 -05:00
2023-12-12 09:39:22 -08:00
2026-02-24 14:59:55 -05:00
2026-01-27 17:08:24 -05:00
2025-08-23 23:37:33 -04:00
2026-01-28 06:45:01 -05:00
2026-02-10 23:01:16 +07:00
2026-03-03 10:06:23 -05:00
2025-03-31 13:40:12 +07:00
2026-01-27 17:08:24 -05:00
2025-08-23 23:37:33 -04:00
2026-01-27 17:08:24 -05:00
2025-08-23 23:37:33 -04:00
2026-02-25 11:31:11 +07:00
2025-12-22 13:59:49 -05:00
2025-08-23 23:37:33 -04:00
2025-10-13 17:18:12 +07:00
2025-03-21 11:02:43 -04:00
2024-03-14 11:05:42 -04:00
2025-10-16 16:07:27 +07:00
2025-09-17 13:27:03 -04:00
2025-08-23 23:37:33 -04:00
2026-01-27 17:08:24 -05:00
2025-10-22 19:16:55 -07:00
2025-08-23 23:37:33 -04:00
2025-10-13 17:18:12 +07:00
2025-09-02 12:08:44 -04:00
2026-01-27 17:08:24 -05:00
2025-08-23 23:37:33 -04:00
2024-08-22 11:46:57 -04:00
2026-02-25 11:11:20 +07:00
2026-03-02 12:55:59 -05:00
2025-08-23 23:37:33 -04:00
2025-09-10 20:27:00 -04:00
2026-03-02 12:26:30 -05:00
2025-07-14 10:05:26 -04:00
2025-09-17 13:27:03 -04:00
2025-12-19 10:43:47 -05:00