Commit Graph

  • faed3905fd version tag 0.13.2 v0.13.2 release-v0.13.x Wing Lian 2026-01-22 10:58:38 -05:00
  • 4c48b9b508 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-22 01:07:53 +00:00
  • a531e9d946 upgrade vllm to v0.14.0 (#3345) Wing Lian 2026-01-21 20:00:18 -05:00
  • 1a1ad97f01 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-21 22:31:31 +00:00
  • 04328aeb97 cu129 targets for ci builds (#3369) Wing Lian 2026-01-21 17:24:44 -05:00
  • d0d26d5064 feat: Add GDPO Support (#3353) VED 2026-01-22 03:52:45 +05:30
  • 8623dd8a72 strip only starting 'v' char; e.g don't strip from '.dev' (#3368) [skip ci] Wing Lian 2026-01-21 14:19:03 -05:00
  • edb0092f8b Built site for gh-pages Quarto GHA Workflow Runner 2026-01-21 18:41:44 +00:00
  • 8cd75cff9f use cuda 12.9.1 and add python 3.12 to base images (#3367) Wing Lian 2026-01-21 13:34:14 -05:00
  • 68fc0eeab3 strip only starting 'v' char; e.g don't strip from '.dev' version-dev Wing Lian 2026-01-21 09:31:17 -05:00
  • 0a0115493d remove monkeypatch dft Salman Mohammadi 2026-01-21 10:06:58 +00:00
  • 1331f9c0e1 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-21 04:06:53 +00:00
  • 8ab9d9ea88 Version dev (#3365) Wing Lian 2026-01-20 22:58:29 -05:00
  • 729221e9bb ensure version is available Wing Lian 2026-01-20 11:26:15 -05:00
  • 5d0d76e4f4 update pypi publish to update VERSION file Wing Lian 2026-01-20 10:26:51 -05:00
  • 3f9555822e use VERSION file and set dev version Wing Lian 2026-01-20 10:22:17 -05:00
  • 20237f29ad Built site for gh-pages Quarto GHA Workflow Runner 2026-01-20 14:06:07 +00:00
  • 6e42def14b set version to v0.13.1 (#3363) v0.13.1 Wing Lian 2026-01-20 08:58:32 -05:00
  • 7a4f33802d try run regular CE loss on eval Salman Mohammadi 2026-01-19 22:00:48 +00:00
  • 8386234afa Built site for gh-pages Quarto GHA Workflow Runner 2026-01-16 16:58:40 +00:00
  • c413480b35 upgrade transformers to 4.57.6 and peft to 0.17.1 and datasets to 4.5.0 (#3361) Wing Lian 2026-01-16 11:48:50 -05:00
  • fa2b336702 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-16 16:25:21 +00:00
  • 8f25124269 upgrade transformers to 4.57.5 (#3358) Wing Lian 2026-01-16 11:17:43 -05:00
  • 6d33c0f537 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-16 14:10:25 +00:00
  • 790df757cb don't install xformers in for arm64 (#3359) Wing Lian 2026-01-16 09:02:37 -05:00
  • 170dca9bb9 WIP DFT Salman Mohammadi 2026-01-15 18:43:31 +00:00
  • a1c183a078 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-14 17:10:35 +00:00
  • d282f32481 don't install deepspeed in arm64 images (#3357) Wing Lian 2026-01-14 12:03:55 -05:00
  • 36ed6719c7 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-14 17:03:18 +00:00
  • 6331e4a130 fix amd64 and set 2.9.1 as latest cloud image (#3356) Wing Lian 2026-01-14 11:56:36 -05:00
  • 949c4a72de Built site for gh-pages Quarto GHA Workflow Runner 2026-01-14 14:46:30 +00:00
  • 1410e4474e update PR template (#3349) [skip ci] salman 2026-01-14 15:39:21 +01:00
  • dc77b5bf42 fix arm64 builds (#3355) Wing Lian 2026-01-14 09:38:48 -05:00
  • e3d036cd21 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-13 14:56:26 +00:00
  • 359b7ad85e fix: gemma3_text model loading vision config (#3354) NanoCode012 2026-01-13 21:49:23 +07:00
  • 208f8b253f add validation for DFT dynamic-sft Wing Lian 2025-08-11 21:28:07 -04:00
  • 75ad1a9932 use dynamic finetuning with chunked cross entropy Wing Lian 2025-08-11 18:48:32 -04:00
  • 00f9e006be Built site for gh-pages Quarto GHA Workflow Runner 2026-01-13 07:40:45 +00:00
  • 258ce8d4fa feat : scaled softmax support (#3338) VED 2026-01-13 13:03:11 +05:30
  • c73ac3a615 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-12 17:06:47 +00:00
  • 3e0bbd33ec feat: add ARM64/AArch64 build support to Dockerfile-base (#3346) @TT 2026-01-12 11:00:02 -06:00
  • a26071f5bf Built site for gh-pages Quarto GHA Workflow Runner 2026-01-12 14:50:29 +00:00
  • 4ae6f766ad bump bnb to v0.49.1 (#3351) salman 2026-01-12 15:42:04 +01:00
  • 58112fb9d1 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-06 16:52:02 +00:00
  • e7f0d4ba5b Increased test coverage for lora/qlora (#3147) VED 2026-01-06 22:14:48 +05:30
  • 0923e0a73c Built site for gh-pages Quarto GHA Workflow Runner 2026-01-06 14:32:22 +00:00
  • 31851b7991 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-06 14:26:27 +00:00
  • 7bf6f70e96 fix total/trainable tokens log (#3344) VED 2026-01-06 19:55:17 +05:30
  • 8aab807e67 feat: Add SwanLab integration for experiment tracking (#3334) PraMamba 2026-01-06 22:19:18 +08:00
  • 80fc86c505 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-05 20:31:08 +00:00
  • ee59e4de97 add cu130 + torch 2.9.1 to test matrices (#3343) Wing Lian 2026-01-05 15:24:29 -05:00
  • 61ea2b06b5 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-05 18:55:30 +00:00
  • 4e61b8aa23 use updated version of prebuilt wheels for flash attention for cu130 (#3342) Wing Lian 2026-01-05 13:48:12 -05:00
  • 7a08e4117a wip ao upgrade upgrade-torchao-0.15 Salman Mohammadi 2026-01-05 18:23:33 +00:00
  • 4e8bde19ae Built site for gh-pages Quarto GHA Workflow Runner 2026-01-03 23:15:54 +00:00
  • b26ba3a5cb don't build images w cuda 130 since we don't have flash attention wheels (#3341) Wing Lian 2026-01-03 18:08:28 -05:00
  • 5e41cfda19 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-01 11:59:50 +00:00
  • afe18ace35 deprecate torch 2.7.1 (#3339) Wing Lian 2026-01-01 06:52:45 -05:00
  • 2b199f9915 chore: update pre-commit hooks (#3340) [skip ci] github-actions[bot] 2026-01-01 06:52:28 -05:00
  • 39b7f26749 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-30 17:48:19 +00:00
  • e73dab6df9 support pydantic 2.12 (#3328) Wing Lian 2025-12-30 12:41:07 -05:00
  • f45a97a9ff docs for checkpiont saving (#3335) [skip ci] VED 2025-12-30 23:10:32 +05:30
  • 0c84185e0a Built site for gh-pages Quarto GHA Workflow Runner 2025-12-30 14:10:17 +00:00
  • 11c0b5b256 bartch upgrade dependencies (#3299) Wing Lian 2025-12-30 09:02:49 -05:00
  • 119056a341 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-25 12:24:45 +00:00
  • 66a3de3629 build examples readmes with quarto (#3046) Wing Lian 2025-12-25 07:17:25 -05:00
  • a6080df73c compute loss only if training and update token metric naming (#3293) [skip ci] VED 2025-12-25 17:08:17 +05:30
  • 60f227a9f0 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-25 11:15:25 +00:00
  • 4f5e8a328a Feat: add MiMo and Plano (#3332) [skip-ci] NanoCode012 2025-12-25 18:09:03 +07:00
  • 418933f0d1 feat: add internvl3_5 (#3141) [skip-ci] NanoCode012 2025-12-25 18:07:59 +07:00
  • 5339a73a2c Built site for gh-pages Quarto GHA Workflow Runner 2025-12-25 11:03:05 +00:00
  • 372f664c63 feat: cleanup old flex mask patch, suppress Matmul bnb warn, and misc (#3330) [skip-ci] NanoCode012 2025-12-25 17:56:20 +07:00
  • 97f1b1758d Feat: add kimi linear support (#3257) NanoCode012 2025-12-25 17:53:52 +07:00
  • be1f8db913 Merge branch 'main' into feat/glm45 feat/glm45 NanoCode012 2025-12-25 17:50:09 +07:00
  • f2155eaf79 feat: add trackio as experiment tracking integration (#3253) Abubakar Abid 2025-12-23 05:49:07 -08:00
  • 3411187898 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-22 19:07:05 +00:00
  • 92ee4256f7 feature: raise on long sequence drop (#3321) kallewoof 2025-12-23 03:59:49 +09:00
  • efeb5a4e41 fix check for fp8 capability (#3324) Wing Lian 2025-12-22 13:58:25 -05:00
  • 33140be573 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-22 14:00:46 +00:00
  • faaff6c792 allow users to set ndigits for rounding of metrics when logging (#3325) VED 2025-12-22 19:24:43 +05:30
  • 43cef27458 Fix typo in densemixer RuntimeError (#3327) [skip ci] Alexander Kozhevnikov 2025-12-22 16:53:58 +03:00
  • 07c41a6c2a fix preview docs failing due to running out of disk (#3326) [skip ci] Wing Lian 2025-12-19 11:34:55 -05:00
  • e8230c9de8 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-19 15:49:46 +00:00
  • bbd3486f57 Distributed Muon Optimizer (#3264) salman 2025-12-19 16:43:47 +01:00
  • 3750d7dd64 add liger support kernal for dpo (#3302) VED 2025-12-18 21:41:06 +05:30
  • dc46f3edd3 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-18 14:08:32 +00:00
  • 2197b0bf89 feat: cheap ppl metric (#3317) xzuyn 2025-12-18 09:02:41 -05:00
  • 0fccbadb79 📝 Add docstrings to 202512-raise_on_drop coderabbitai/docstrings/3e51a68 coderabbitai[bot] 2025-12-18 05:49:01 +00:00
  • 89b3535663 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-17 18:46:20 +00:00
  • 3e51a680c2 fix: Fix evaluation loss in KD trainer (#3271) Seung Hyun Cho 2025-12-18 03:40:36 +09:00
  • 2cf254b4af Add peft_autocast_adapter_dtype config option (#3311) [skip ci] xzuyn 2025-12-17 10:09:39 -05:00
  • 83d4d97dcc Add QAT NVFP4 configs for blogpost (#3280) [skip ci] salman 2025-12-17 15:35:22 +01:00
  • 9cf40eba23 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-17 14:18:21 +00:00
  • a1d07f42e4 Fix(misc): address PYTORCH_CUDA_ALLOC_CONF deprecate (#3313) NanoCode012 2025-12-17 21:12:18 +07:00
  • 2a664dc8ad support for xformers wheels for torch 2.9 (#3308) Wing Lian 2025-12-11 11:56:40 -05:00
  • 3a1d8ba3d8 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-09 07:36:59 +00:00
  • 4ac78aa562 fix: update qwen3 jinja tokenization off a few tokens (#3295) NanoCode012 2025-12-09 14:31:03 +07:00
  • 72b76bbb67 Built site for gh-pages Quarto GHA Workflow Runner 2025-12-08 14:22:26 +00:00
  • b3f4aa149f fix bin size (#3307) VED 2025-12-08 19:46:18 +05:30
  • 75b20fb66f Save processor in quantizer CLI (#3290) salman 2025-12-06 16:27:18 +00:00