Commit Graph

  • 8693a1f61b fix Dockerfile-base-next: cuda 12.8.2, miniforge, sm_120 activeblue/main tocmo0nlord 2026-05-13 14:37:01 +00:00
  • 71c6a56e7a switch to HQQ quantization to bypass bitsandbytes sm_120 issue tocmo0nlord 2026-05-13 13:55:52 +00:00
  • 38adf5cd37 add trust_remote_code, explicit bfloat16 and bnb dtype settings tocmo0nlord 2026-05-13 13:32:46 +00:00
  • 3f29fa017b replace Capybara with SlimOrca (compatible ShareGPT format) tocmo0nlord 2026-05-13 12:58:29 +00:00
  • c02a76f132 fix field_messages mapping for Capybara/OpenHermes ShareGPT format tocmo0nlord 2026-05-13 12:56:03 +00:00
  • b9ceebfe7e fix deprecated type:sharegpt and flash_attention config keys tocmo0nlord 2026-05-13 12:52:25 +00:00
  • e9a3fd483f add human-like QLoRA training config for Llama 3.1 8B tocmo0nlord 2026-05-13 12:50:35 +00:00
  • eadd15c960 note MAX_JOBS for flash-attn compile speed tocmo0nlord 2026-05-13 04:45:21 +00:00
  • 396ce4a9dd add miaai environment setup guide tocmo0nlord 2026-05-13 04:16:03 +00:00
  • b7ec06b8a1 Add optional Axolotl MoRA/ReMoRA integration (#3647) [skip ci] Wing Lian 2026-05-12 07:19:55 -04:00
  • e2f01de0e8 Fix Axolotl ReLoRA optimizer reset scope (#3646) Wing Lian 2026-05-09 17:52:35 -04:00
  • 5352d41d32 feat: systemic multimodal assistant-only loss masking + cfg.role_boundaries` (#3625) thad0ctor 2026-05-05 08:25:39 -07:00
  • c15f6cffe2 fix: FSDP FULL_STATE_DICT oom from memory leak (#3635) VED 2026-05-05 20:52:35 +05:30
  • e4032fc90f Refactor separate attention flags with attn_implementation and capability/concerns feature flags (#3602) Wing Lian 2026-05-05 10:15:18 -04:00
  • 6136ae627b Fix: add bitnet config (#3636) Younes B 2026-04-30 20:30:56 +04:00
  • e662972a29 Feat: Add bitnet integration (#3634) Younes B 2026-04-30 19:25:02 +04:00
  • ebbd7fa847 feat: Add Mistral Medium 3.5 (#3633) NanoCode012 2026-04-29 22:46:51 +07:00
  • ac77da96da use smaller pretrained models for ci (#3620) [skip ci] Wing Lian 2026-04-27 13:22:56 -04:00
  • 993db05b3a fix losses smol-ci Wing Lian 2026-04-25 08:44:37 +00:00
  • 464de78f6d regroup attn_implementation tests by feature concern attn-implementation-refactor Wing Lian 2026-04-25 08:57:56 +00:00
  • 7a41b47d22 drop "Phase 2" naming from attn-implementation tests Wing Lian 2026-04-25 08:54:14 +00:00
  • 6886def92c fix duplicate attn_implementation in gpt-oss yamls and flaky caplog tests Wing Lian 2026-04-25 08:53:28 +00:00
  • 1b9520cc8b more train steps Wing Lian 2026-04-25 02:17:48 +00:00
  • 79255ffdd7 Built site for gh-pages gh-pages Quarto GHA Workflow Runner 2026-04-24 09:09:53 +00:00
  • 798c8fba89 chore: update docker docs (#3623) main NanoCode012 2026-04-24 16:03:21 +07:00
  • 17fc747f99 fix: docker build failing (#3622) NanoCode012 2026-04-24 14:23:09 +07:00
  • f77408a3d0 fix tests Wing Lian 2026-04-23 23:47:28 +00:00
  • aeca18a8b0 remove dead gemma4 branch in _set_attention_config Wing Lian 2026-04-23 22:22:56 +00:00
  • 434a484fe9 update doc snippets + reject gemma4-hybrid with non-FA2 backend Wing Lian 2026-04-23 22:18:02 +00:00
  • 39226623d2 migrate example configs to canonical attn_implementation Wing Lian 2026-04-23 22:15:07 +00:00
  • 2d64d009d8 expand attention tests + rewrite docs Wing Lian 2026-04-23 21:30:20 +00:00
  • a0d24bcc19 migrate remaining consumers to canonical attn_implementation Wing Lian 2026-04-23 21:26:18 +00:00
  • bce65e3332 move attention-dependent validators to mode=after Wing Lian 2026-04-23 21:23:11 +00:00
  • 2579c496d5 make attn_implementation the single source of truth Wing Lian 2026-04-23 21:17:10 +00:00
  • 35d43fe141 compute attn capability flags in normalizer instead of properties Wing Lian 2026-04-12 23:46:44 -04:00
  • ff5d6393c8 replace legacy attention boolean flags with capability properties Wing Lian 2026-04-12 22:01:09 -04:00
  • aee8c75d64 refactor attention handling Wing Lian 2026-04-01 16:57:34 +00:00
  • c4f986874d chore: lint Wing Lian 2026-04-01 16:08:43 +00:00
  • 28e89a5c16 upgrade to torchao 0.17.0 Wing Lian 2026-04-01 16:04:55 +00:00
  • 98e18d59d2 larger torch cuda arch list coverage for older cuda torch-211-base Wing Lian 2026-04-23 15:05:01 -04:00
  • 462135acfb add pytorch 2.11 to base images Wing Lian 2026-04-23 14:57:49 -04:00
  • 8495c79fb1 properly handles kernels repo type kernelize-scattermoe-lora Wing Lian 2026-04-23 14:56:16 -04:00
  • 901f2356bc dpo collation/padding (#3601) [skip ci] Wing Lian 2026-04-23 14:49:52 -04:00
  • 5db4272f69 more steps for loss check Wing Lian 2026-04-23 18:43:18 +00:00
  • 431888c1de use smaller pretrained models for ci Wing Lian 2026-04-23 13:51:01 +00:00
  • ac48bfadba Built site for gh-pages Quarto GHA Workflow Runner 2026-04-23 04:33:48 +00:00
  • 1bf65c500e feat: add processor_kwargs YAML field forwarded to from_pretrained (#3612) thad0ctor 2026-04-22 21:26:34 -07:00
  • bcbe049c21 Feat: add support for datasets with str saved messages field (#3607) brightwind26 2026-04-23 08:25:48 +04:00
  • 90090fa9e8 DPO support loss types (#3566) Andrew Wu 2026-04-23 05:25:28 +01:00
  • c7ad3c8e22 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-22 13:12:17 +00:00
  • 7420fd4de6 fix async prefetch with nemogym (#3606) Wing Lian 2026-04-22 09:05:46 -04:00
  • 918d02d7a9 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-22 05:21:50 +00:00
  • 05113bc91a train on remote compute using Tinker compatible APIs (#3614) Wing Lian 2026-04-22 01:14:41 -04:00
  • 9a0d3016df first pass at build and deploy scattermoe-lora kernel Wing Lian 2026-04-22 01:10:01 -04:00
  • 70b4b68acf Built site for gh-pages Quarto GHA Workflow Runner 2026-04-21 21:56:08 +00:00
  • e562e149ce fix: [gemma4] fix VRAM leak in hybrid FA2+SDPA (hybrid attentiuon) path under activation check… (#3611) thad0ctor 2026-04-21 14:49:58 -07:00
  • f18c2bb1f8 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-21 14:23:11 +00:00
  • 9de5b76336 feat: move to uv first (#3545) NanoCode012 2026-04-21 21:16:03 +07:00
  • d17ed89a3c add missing file swe-rebench-rl-rebase Wing Lian 2026-04-21 08:44:01 -04:00
  • 02e4f2350d fixes for scattermoe from latest peft upgrade Wing Lian 2026-04-21 08:00:16 -04:00
  • cec99c4133 fix test dims vllm-0191 Wing Lian 2026-04-21 00:44:26 +00:00
  • 4195605ab2 fix test dims Wing Lian 2026-04-21 00:44:26 +00:00
  • 37acb28d02 fix einsum dims Wing Lian 2026-04-20 23:09:47 +00:00
  • d248242490 support for vllm 0.19.1 Wing Lian 2026-04-19 18:09:46 -04:00
  • 4a5281e61a Fix shape Wing Lian 2026-04-19 01:53:05 +00:00
  • a892d8cce1 chore: lint Wing Lian 2026-04-17 17:48:26 +00:00
  • 78de2919a6 tiled mlp fix for gemma4 Wing Lian 2026-04-16 13:24:41 +00:00
  • 4696e9911f Built site for gh-pages Quarto GHA Workflow Runner 2026-04-15 13:33:49 +00:00
  • 28283ff373 revert shared_kv_states workaround with transformers 5.5.4 Wing Lian 2026-04-15 13:32:59 +00:00
  • dc16859983 [gemma4] fix fused RMSNorm+RoPE on hybrid attention models Wing Lian 2026-04-15 12:59:00 +00:00
  • d4e9cf2eec lint Wing Lian 2026-04-14 17:26:00 -04:00
  • 53391a10d7 vllm-serve-lora add /v1/completions route + worker pipe lock Wing Lian 2026-04-14 15:52:02 +00:00
  • 7617b951a8 make _maybe_sync_vllm_weights actually fire in sync mode Wing Lian 2026-04-13 18:30:16 +00:00
  • e993ed5208 retry head-server probe with longer timeout Wing Lian 2026-04-13 18:29:55 +00:00
  • 69f165b39b probe vLLM weight-sync routes and select transport per server Wing Lian 2026-04-13 18:29:45 +00:00
  • 80a97f192b validate batch shape against num_generations at config time Wing Lian 2026-04-13 18:29:22 +00:00
  • 323da791eb bump transformers to 5.5.4 and trl to latest 1.1.0 (#3603) Wing Lian 2026-04-15 09:27:03 -04:00
  • 6867872c76 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-13 00:59:30 +00:00
  • 6990478163 fix: rename model to adapter_model for fsdp sharded final model (#3585) NanoCode012 2026-04-13 07:51:30 +07:00
  • 63a58cfec1 feat: support excess_length_strategy for RL trainers (#3578) [skip ci] ゆり 2026-04-13 08:51:10 +08:00
  • 3985ec2f67 feat: add FineGrainedFP8Config support for model quantization (#3587) [skip ci] madScientist10 2026-04-13 03:50:37 +03:00
  • a44edda6d7 Skip redundant evaluation when resuming from checkpoint (#3575) [skip ci] Joaquin Hui 2026-04-13 01:50:15 +01:00
  • afd0657e08 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-12 15:04:55 +00:00
  • 66c3e5a3fd better handling of dora merge on Conv layers in Qwen 3.5 (#3599) Wing Lian 2026-04-12 10:57:45 -04:00
  • 6a6a6329a0 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-12 14:37:16 +00:00
  • b8358aa5ab [gemma4] use mixed Flash Attention and SDPA and add fused RMSNorm+RoPE Triton kernels (#3598) Wing Lian 2026-04-12 10:29:55 -04:00
  • e079cf16a2 qwen3_5.jinja: handle list content on system messages (#3595) [skip ci] Joaquin Hui 2026-04-12 05:58:58 +01:00
  • e2f69828d2 [fix][fsdp2] clone sharded param so original full size shard can be gc'ed (#3597) [skip ci] Wing Lian 2026-04-11 20:22:35 -04:00
  • 122b50bad6 pre-cache the eot token ids rather than on each iteration (#3594) [skip ci] Wing Lian 2026-04-11 20:05:21 -04:00
  • 21b0c220c4 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-10 21:15:42 +00:00
  • e77a185e86 upgrade transformers to use v5.5.3 (#3593) Wing Lian 2026-04-10 17:08:14 -04:00
  • d76f8c505c Built site for gh-pages Quarto GHA Workflow Runner 2026-04-10 20:53:46 +00:00
  • 29fa4dedbb Gemma4 fixes and profiler (#3591) Wing Lian 2026-04-10 16:46:17 -04:00
  • af1d4c8e78 Built site for gh-pages Quarto GHA Workflow Runner 2026-04-10 18:18:53 +00:00
  • 315cdeede9 handle trainable/masked spans in content and reasoning content (#3592) Wing Lian 2026-04-10 14:11:10 -04:00
  • e7a6a5b529 fix: move warning after we've set any overrides (#3589) [skip ci] NanoCode012 2026-04-11 00:00:47 +07:00
  • bfb4da1d25 fix: document jinja2 file path support (#3588) [skip ci] NanoCode012 2026-04-11 00:00:26 +07:00
  • 4dfa0a59b2 Add uninstall command to cut_cross_entropy import message (#3583) [skip ci] floaty3 2026-04-10 17:00:07 +00:00
  • 8a926a64dc Built site for gh-pages Quarto GHA Workflow Runner 2026-04-10 03:09:51 +00:00
  • 4ef608dda3 fix ddp/fsdp w gemma4 (#3584) Wing Lian 2026-04-09 20:02:36 -07:00