Commit Graph

  • e340a8add7 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-04 15:03:48 +00:00
  • 5992e607a2 fix: improve ministral3 docs to be clearer (#3300) NanoCode012 2025-12-04 21:44:44 +07:00
  • 390ac3f224 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-04 13:38:07 +00:00
  • 2b66ee189c Feat: add ministral3 (#3297) NanoCode012 2025-12-04 20:32:08 +07:00
  • 8e6a9b0644 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-02 18:18:57 +00:00
  • 86d8cca149 Feat: add trinity by ArceeAI (#3292) NanoCode012 2025-12-03 01:12:55 +07:00
  • dcd916b29b bump transformers 4.57.3 transformers-4573 Wing Lian 2025-12-02 10:33:44 -05:00
  • 08c8f3f22f fix: total tokens and defaults in config fix/diffusion NanoCode012 2025-12-02 21:38:10 +07:00
  • 76f0fe2621 fix: steps not allowed fractional NanoCode012 2025-12-02 21:30:15 +07:00
  • fff60afdea Built site for gh-pages Quarto GHA Workflow Runner 2025-12-02 14:22:43 +00:00
  • 4a0f98e612 feat: upgrade liger to 0.6.4 (#3289) NanoCode012 2025-12-02 21:16:23 +07:00
  • 1a369a9783 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-01 08:58:50 +00:00
  • c6ddcdd06a feat: add exaone4 chat template and update enums (#3279) Yohan Na 2025-12-01 17:52:45 +09:00
  • 811966ac5b Built site for gh-pages Quarto GHA Workflow Runner 2025-12-01 08:09:07 +00:00
  • 7fb6a947d9 chore: update pre-commit hooks (#3287) github-actions[bot] 2025-12-01 15:03:14 +07:00
  • 93600fa80d 📝 Add docstrings to feat/qwen3-vl-liger-integration coderabbitai/docstrings/b234532 coderabbitai[bot] 2025-11-30 18:29:28 +00:00
  • 9dd88dc31d Built site for gh-pages Quarto GHA Workflow Runner 2025-11-28 12:00:44 +00:00
  • b234532d9f Feat: add peft_ensure_weight_tying (#3278) NanoCode012 2025-11-28 18:54:48 +07:00
  • a526647b31 Merge branch 'main' into feat/glm45 NanoCode012 2025-11-28 13:41:25 +07:00
  • c0eeb9f3ab Built site for gh-pages Quarto GHA Workflow Runner 2025-11-24 06:54:45 +00:00
  • 8990ca3205 fix: removed unused "scikit-learn==1.4.2" (#3277) VED 2025-11-24 12:18:53 +05:30
  • cd7fdaeeb6 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-24 03:27:39 +00:00
  • 006f226270 Feat: add Olmo3 (BC with Olmo and Olmo2) (#3275) NanoCode012 2025-11-24 10:21:31 +07:00
  • 24cd17113a Built site for gh-pages Quarto GHA Workflow Runner 2025-11-20 14:35:55 +00:00
  • 0b635e69c5 build docker images for 2.9.x (#3273) Wing Lian 2025-11-20 09:26:24 -05:00
  • 6a1c740e54 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-20 14:10:37 +00:00
  • 0d27e14e45 Torch 2.9.1 base images (#3268) Wing Lian 2025-11-20 09:04:37 -05:00
  • e9f1cda1fc Built site for gh-pages Quarto GHA Workflow Runner 2025-11-18 07:51:37 +00:00
  • f5f21fb216 chore: update readme with latest updates (#3267) v0.13.0 NanoCode012 2025-11-18 14:45:21 +07:00
  • 27103e27b0 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-18 04:43:15 +00:00
  • 4e55871112 feat: Add opt-out Telemetry (#3237) NanoCode012 2025-11-18 11:35:25 +07:00
  • c71810cb8b Built site for gh-pages Quarto GHA Workflow Runner 2025-11-14 17:57:48 +00:00
  • a6bafb55cb upgrade datasets to 4.4.1 (#3266) Wing Lian 2025-11-14 12:52:14 -05:00
  • c24a780384 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-14 15:56:25 +00:00
  • 0fbde69e9c only push axolotl images, personal repo is deprecated (#3262) Wing Lian 2025-11-14 10:50:03 -05:00
  • fde5db6e73 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-13 18:08:52 +00:00
  • 301e22849f upgrade to latest deepspeed and make sure latest tagged axolotl images are using torch 2.8.0 (#3261) Wing Lian 2025-11-13 13:03:01 -05:00
  • cdd494ba72 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-13 15:27:01 +00:00
  • dcf24fd24e feat: save checkpoint after training started (#3233) VED 2025-11-13 20:51:05 +05:30
  • 49b8107989 feat: add granite4 examples (#3256) [skip ci] NanoCode012 2025-11-13 22:19:16 +07:00
  • 9901ee5602 fix: voxtralprocessor broken (#3255) [skip ci] NanoCode012 2025-11-13 22:18:42 +07:00
  • 9d89fdba0c Built site for gh-pages Quarto GHA Workflow Runner 2025-11-11 03:38:33 +00:00
  • dd78f2e0cc Fix: warmup_steps: 0 & warmup_ratio: 0 not disabling warmup (#3254) xzuyn 2025-11-10 22:32:06 -05:00
  • 8ddeb0f82a Built site for gh-pages Quarto GHA Workflow Runner 2025-11-11 02:10:53 +00:00
  • b54f9c942b _get_tools in ChatTemplateStrategy : function "parameters" can be dict or string (#3238) Eduard Zl 2025-11-11 04:04:28 +02:00
  • f9e101605f Built site for gh-pages Quarto GHA Workflow Runner 2025-11-10 14:43:48 +00:00
  • 8069177284 Merge branch 'main' into feat/glm45 NanoCode012 2025-11-10 21:41:05 +07:00
  • 11eb36585a feat: add arg to enable dft in liger (#3125) NanoCode012 2025-11-10 21:37:47 +07:00
  • d0c846fc5e feat: add granitemoeshared and granitemoehybrid (#3158) NanoCode012 2025-11-10 21:35:45 +07:00
  • 9305a55451 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-07 21:11:21 +00:00
  • b5fcc2f14b log cumulative total trained tokens (#3252) Wing Lian 2025-11-07 16:04:00 -05:00
  • 5e8e6ede37 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-07 17:23:49 +00:00
  • b62eed8809 add openenv-core to requirements (#3251) Wing Lian 2025-11-07 12:17:27 -05:00
  • ed2e8cacd6 feat:openenv rollout_func (#3239) [skip ci] VED 2025-11-07 19:21:40 +05:30
  • 80270a92fa Fix typos in some files (#3250) [skip ci] Lê Nam Khánh 2025-11-07 20:21:20 +07:00
  • a712a75b86 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-06 21:12:19 +00:00
  • bfdc9a8249 upgrade trl and other hf deps (#3249) Wing Lian 2025-11-06 16:06:03 -05:00
  • 83ff8bfa1a fix: change docker miniconda install to workspace fix/hpc-root NanoCode012 2025-09-23 12:55:57 +07:00
  • 5011c40c5e Built site for gh-pages Quarto GHA Workflow Runner 2025-11-04 13:53:36 +00:00
  • c37decb073 update pre-commit cadence (#3245) salman 2025-11-04 13:43:40 +00:00
  • ffbacf2b2a Built site for gh-pages Quarto GHA Workflow Runner 2025-11-04 00:45:34 +00:00
  • e15a070f43 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-04 00:41:13 +00:00
  • 01a346d86a feat(example): add gpt-oss-safeguard docs (#3243) NanoCode012 2025-11-04 07:39:21 +07:00
  • 26f05b6008 fix(example): set model_type to load for gemma3 text (#3242) NanoCode012 2025-11-04 07:35:07 +07:00
  • 39962b0cd4 Built site for gh-pages Quarto GHA Workflow Runner 2025-11-03 16:01:49 +00:00
  • ed58fa8a75 chore: update pre-commit hooks (#3244) github-actions[bot] 2025-11-03 15:55:40 +00:00
  • 33f4a300b3 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-30 22:56:32 +00:00
  • 633afffacb add torch 2.9.0 to ci (#3223) Wing Lian 2025-10-30 18:50:26 -04:00
  • d6988814e4 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-30 14:09:26 +00:00
  • 4b1b4fa6d8 upgrade numpy (#3236) Wing Lian 2025-10-30 10:03:24 -04:00
  • 9750a4442b Built site for gh-pages Quarto GHA Workflow Runner 2025-10-29 22:13:57 +00:00
  • 0f7c886b7b chore: update pre-commit hooks (#3222) [skip ci] github-actions[bot] 2025-10-29 18:09:46 -04:00
  • 5626d4d0f1 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-29 22:08:08 +00:00
  • a4b921135b build cuda 13.0.0 base image with 2.9.0 (#3229) Wing Lian 2025-10-29 18:07:29 -04:00
  • 98333e639a upgrade trl to 0.24.0 and liger to 0.6.3 (#3230) Wing Lian 2025-10-29 18:02:16 -04:00
  • 9ee7ce5c85 set TORCH_CUDA_ARCH_LIST correctly liger-063 Wing Lian 2025-10-29 12:59:26 -04:00
  • a41ca4d06f upgrade liger dep to 0.6.3 Wing Lian 2025-10-27 14:49:09 -04:00
  • 1a3e195f79 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-27 07:48:23 +00:00
  • 9d4d39e939 Diffusion trainer fix: shift logits to align with input tokens (#3191) Dan Saunders 2025-10-27 03:42:01 -04:00
  • 1a3d5cea46 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-23 04:31:00 +00:00
  • bb33fda44d install flash attention in 2.9.0 base images (#3224) Wing Lian 2025-10-22 21:24:52 -07:00
  • 65ff82796d Built site for gh-pages Quarto GHA Workflow Runner 2025-10-23 02:23:20 +00:00
  • 4dc018992d Feat/opentelemetry (#3215) VED 2025-10-23 07:46:55 +05:30
  • 302e9406ed Built site for gh-pages Quarto GHA Workflow Runner 2025-10-22 22:29:16 +00:00
  • 243620394a fix: force train split for json,csv,txt for test_datasets and misc doc changes (#3226) NanoCode012 2025-10-23 05:23:20 +07:00
  • 27d2c41079 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-22 14:28:26 +00:00
  • 3750fdcf79 Fix trainer dataloader slow loading issue (#3219) Qingyang Wu 2025-10-22 07:22:14 -07:00
  • 3a80f800c4 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-22 14:01:05 +00:00
  • 613bcf90e5 fix: enable_sleep_mode -> vllm_enable_sleep_mode (#3225) Matthew Hambrecht 2025-10-22 09:55:26 -04:00
  • 2d535ec190 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-20 12:59:35 +00:00
  • 383f220cfd build torch 2.9.0 base images (#3221) Wing Lian 2025-10-20 08:53:49 -04:00
  • d6d9d42193 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-20 07:14:02 +00:00
  • 8bb871b5cf fix: deepspeed with context parallel (#3220) NanoCode012 2025-10-20 14:06:58 +07:00
  • 333ca134a2 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-17 10:06:27 +00:00
  • 87565ecc05 Add chat_template.argilla_chat support for DPO datasets (#3202) Leonard 2025-10-17 19:00:26 +09:00
  • f64e3bbb40 Built site for gh-pages Quarto GHA Workflow Runner 2025-10-17 03:40:53 +00:00
  • 93ba57396f fix: qwen3_vl attention config (#3216) NanoCode012 2025-10-17 10:35:03 +07:00
  • b98a037bec Built site for gh-pages Quarto GHA Workflow Runner 2025-10-16 09:13:35 +00:00
  • aa1240acd8 fix: transformers deprecate load_in_Xbit in model_kwargs (#3205) NanoCode012 2025-10-16 16:07:27 +07:00
  • 2327fe0f9b Built site for gh-pages Quarto GHA Workflow Runner 2025-10-14 19:59:54 +00:00