Commit Graph

  • 4244e6ee6e Built site for gh-pages Quarto GHA Workflow Runner 2026-03-03 18:09:36 +00:00
  • 653f90be25 Add torch 2.10.0 to unit tests and use python 3.14 (#3450) Wing Lian 2026-03-03 13:01:52 -05:00
  • 28bc7f60e2 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-03 15:13:12 +00:00
  • 945c8aeb10 Fix: quantize and target moe layers in transformers v5 for adapters and many misc fixes (#3439) NanoCode012 2026-03-03 22:06:23 +07:00
  • 35e6a7e228 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-03 14:34:09 +00:00
  • e672d37f33 fix: qwen3-next to use fla causal-conv1d to support packing (#3437 NanoCode012 2026-03-03 21:26:46 +07:00
  • f94ec0434c include tool in default message_property_mappings tool-mpm Wing Lian 2025-10-23 10:25:18 -07:00
  • a5b2ace18a Built site for gh-pages Quarto GHA Workflow Runner 2026-03-02 21:47:10 +00:00
  • 77828d3559 uv cloud image should use uv w pip (#3449) Wing Lian 2026-03-02 16:39:26 -05:00
  • f3afbe22ff Built site for gh-pages Quarto GHA Workflow Runner 2026-03-02 19:31:31 +00:00
  • 4272817109 don't install torch ao on arm64 (#3448) Wing Lian 2026-03-02 14:24:54 -05:00
  • cdfe3d4844 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-02 18:03:21 +00:00
  • 474208b794 fix: Save de-duplicated dataset during pre-processing (#3427) Manas Vardhan 2026-03-02 09:55:59 -08:00
  • 2e306353eb Built site for gh-pages Quarto GHA Workflow Runner 2026-03-02 17:44:09 +00:00
  • 444020b332 mark slow tests that are timing out in CI (#3428) [skip ci] Wing Lian 2026-03-02 12:26:30 -05:00
  • aa88c2e30b fix uv cache subcommand (#3447) Wing Lian 2026-03-02 12:26:08 -05:00
  • 2470d470e2 Built site for gh-pages Quarto GHA Workflow Runner 2026-03-02 08:37:46 +00:00
  • f447bce1db fix: do not push telemetry on non-master rank (#3438) NanoCode012 2026-03-02 15:31:20 +07:00
  • 7f23b302d1 bug-fix: use self.optimizer if optimizer not passed to SchedulerMixin.create_scheduler() (#3435) [skip ci] kallewoof 2026-03-02 17:30:07 +09:00
  • 28cbe2ce6d Built site for gh-pages Quarto GHA Workflow Runner 2026-02-25 19:53:16 +00:00
  • 18f26c19ef add uv axolotl builds (#3431) Wing Lian 2026-02-25 14:46:02 -05:00
  • dafe30369e Built site for gh-pages Quarto GHA Workflow Runner 2026-02-25 04:38:55 +00:00
  • 2b6f4a6c9b Fix: excess_length_strategy truncation method (#3401) Robert Ronan 2026-02-24 23:31:11 -05:00
  • 8f54b4eb25 fix: pass revision parameter to tokenizer and processor loaders (#3388) [skip ci] madScientist10 2026-02-25 06:11:20 +02:00
  • a131e4d0e5 sample gen support sft (#3240) [skip ci] VED 2026-02-25 09:40:57 +05:30
  • aaf47dc7ec Built site for gh-pages Quarto GHA Workflow Runner 2026-02-25 03:43:12 +00:00
  • 1791d87b6f build axolotl images with torch 2.10.0 (#3430) Wing Lian 2026-02-24 22:35:25 -05:00
  • 122ee5b865 Built site for gh-pages Quarto GHA Workflow Runner 2026-02-25 01:38:31 +00:00
  • b40803da51 build base images for torch 2.10.0 (#3429) Wing Lian 2026-02-24 20:32:34 -05:00
  • d8c37917f2 Built site for gh-pages Quarto GHA Workflow Runner 2026-02-24 20:06:37 +00:00
  • 68f1b7004c ScatterMoE LoRA support (#3410) Wing Lian 2026-02-24 14:59:55 -05:00
  • 04ef4073de Built site for gh-pages Quarto GHA Workflow Runner 2026-02-23 16:46:56 +00:00
  • 08441fed17 fix: set allowed values for adapter config (#3415) NanoCode012 2026-02-23 23:39:53 +07:00
  • 86ca1e27c0 fix: update MistralProcessor to be v5 compat (#3423) NanoCode012 2026-02-23 23:39:13 +07:00
  • 63b244d5fe Built site for gh-pages Quarto GHA Workflow Runner 2026-02-23 15:18:28 +00:00
  • 5ed455715e feat: support dot-notation CLI args for nested config options (#3419) Manas Vardhan 2026-02-23 07:10:06 -08:00
  • 0c72beb85d Built site for gh-pages Quarto GHA Workflow Runner 2026-02-23 07:25:27 +00:00
  • 3f30572d4a Fix typo in dataset_processes field (#3426) Lorenzo Baraldi 2026-02-23 08:18:37 +01:00
  • 8c5d2a4e5a Built site for gh-pages Quarto GHA Workflow Runner 2026-02-20 19:32:20 +00:00
  • 43d60c7439 bump cut-cross-entropy to 58d6572 (#3424) NanoCode012 2026-02-21 02:24:51 +07:00
  • 0ea252d392 update to trackio 0.16.1 (#3425) [skip ci] Wing Lian 2026-02-20 14:24:33 -05:00
  • 29722dec60 use bunnycdn for CI assets (#3422) [skip ci] Wing Lian 2026-02-20 00:09:25 -05:00
  • 3b5a9d1d88 update create_optimizer for updated api transformers-itl-refactor Wing Lian 2026-02-19 23:49:32 -05:00
  • eb59070040 fix labels Wing Lian 2026-02-19 23:44:46 -05:00
  • 9722aaf7d8 fix for tokenizers change Wing Lian 2026-02-19 21:52:44 -05:00
  • f325d0cc40 Built site for gh-pages Quarto GHA Workflow Runner 2026-02-19 23:39:50 +00:00
  • a3cdeab27e Built site for gh-pages Quarto GHA Workflow Runner 2026-02-19 23:34:25 +00:00
  • c5d20bbd79 integration branch for transformers#44041 Wing Lian 2026-02-19 18:34:13 -05:00
  • 7fbedbd300 fix(doc): add limitation for unfrozen_parameters (#3416) NanoCode012 2026-02-20 06:32:26 +07:00
  • 145ffc9be1 upgrade transformers to 5.2.0 and torchao to 0.16.0 (#3407) Wing Lian 2026-02-19 18:27:27 -05:00
  • 970b2a6f2f feat: test for config validation and BC for new peft weight dtype feat/torchao-qlora NanoCode012 2026-02-16 21:26:04 +07:00
  • 1f7f5e7c26 feat: handle lora kernels compat with torchao NanoCode012 2026-02-16 21:25:50 +07:00
  • 60c0a828cc feat: add torchao's int4, nf4, int8 NanoCode012 2026-02-16 21:25:24 +07:00
  • d260eeb57d match protected method accelerator-args-builder Wing Lian 2026-02-15 07:55:55 -05:00
  • 4f1b5ad29f fix: clarify how to use lm_eval plugin (#3404) [skip ci] NanoCode012 2026-02-15 19:52:30 +07:00
  • d6a2532dd7 feat(doc): clarify how to use scattermoe (#3408) [skip ci] NanoCode012 2026-02-15 19:51:28 +07:00
  • 5a7f007d20 cleanup ao fp8 patching Wing Lian 2026-02-13 17:02:23 -05:00
  • 53a12282bc fix: log merge command once done fix/gemma3-text-only NanoCode012 2026-02-14 00:45:01 +07:00
  • 7271754902 fix: handle plugin logging NanoCode012 2026-02-14 00:40:43 +07:00
  • 6d5257d92e fix: ignore ds_store NanoCode012 2026-02-14 00:33:53 +07:00
  • 0e357b5df6 fix: load gemma3 as text only model with dynamic weights NanoCode012 2026-02-14 00:32:48 +07:00
  • bcd14fb909 Built site for gh-pages Quarto GHA Workflow Runner 2026-02-12 14:05:20 +00:00
  • 5eb265513c fix generic patch for cce (#3405) Wing Lian 2026-02-12 08:58:04 -05:00
  • 2d13a06722 slow fsdp1 test liger-065 Wing Lian 2026-02-10 13:23:52 -05:00
  • ba27e830e8 triton versions for older pytorch Wing Lian 2026-02-10 11:09:03 -05:00
  • 63a41c6cfc Built site for gh-pages Quarto GHA Workflow Runner 2026-02-10 16:08:39 +00:00
  • 8f7219e139 upgrade liger to 0.6.5 and triton to 3.5.1 Wing Lian 2026-02-10 11:05:00 -05:00
  • 06ac407b92 feat: improve telemetry log (#3398) NanoCode012 2026-02-10 23:01:34 +07:00
  • 4e22cf0651 fix: remove telemetry warning (#3397) [skip ci] NanoCode012 2026-02-10 23:01:16 +07:00
  • 87e0fd6b52 feat: add glm 4.7 flash feat/glmflash-other NanoCode012 2026-02-10 18:57:20 +07:00
  • bf3eb6ec91 Built site for gh-pages Quarto GHA Workflow Runner 2026-02-10 11:13:02 +00:00
  • a4ee56c315 fix: set rollout in GRPO training_kwargs (#3392) VED 2026-02-10 16:36:15 +05:30
  • c67cbcb0f5 fix: ignore add_special_tokens and use test mode for generation for mistral tokenizer (#3396) [skip ci] NanoCode012 2026-02-10 18:03:26 +07:00
  • a2da852576 fix: improve lora kernels failure message and handle trust_remote_code (#3378) [skip ci] NanoCode012 2026-02-10 17:58:40 +07:00
  • 37e9da7a53 add hub_revision support for specifying branch when pushing checkpoints (#3387) [skip ci] madScientist10 2026-02-10 12:53:09 +02:00
  • ed7105dba7 fix: GRPO config not accept max_prompt_length (#3390) [skip ci] NanoCode012 2026-02-10 17:52:09 +07:00
  • b6d3653f74 feat: add step3p5 for cce (#3384) [skip ci] NanoCode012 2026-02-10 17:51:43 +07:00
  • fcc4cfdb63 feat: add sageattention (#2823) [skip ci] NanoCode012 2026-02-10 17:49:21 +07:00
  • 97a4f28511 fix: saving state dict and eval for Context Parallel (#3382) [skip ci] VED 2026-02-10 16:17:26 +05:30
  • 86a5803212 train_per_sec_per_gpu metric (#3364) [skip ci] VED 2026-02-10 16:14:55 +05:30
  • 530a0c0bf0 Changes from dataset_processes to dataset_num_proc (#3352) [skip ci] tgoab 2026-02-10 05:44:17 -05:00
  • 0343a72cc9 add glm support + patch (#3329) [skip ci] VED 2026-02-10 16:13:53 +05:30
  • b8d52a2193 use kwargs online-topk-kd Wing Lian 2026-02-04 12:04:53 -05:00
  • 002b1ac967 max new tokens for online generation Wing Lian 2026-02-04 11:55:19 -05:00
  • 17b01bfe36 handle input only for online Wing Lian 2026-02-04 10:53:10 -05:00
  • a0669335e2 online top-k kd Wing Lian 2026-02-04 09:49:35 -05:00
  • 2d44432e6c chore: update trinity docs NanoCode012 2026-02-04 18:10:33 +07:00
  • 57377814e9 feat: update cce for afmoe NanoCode012 2026-02-04 18:00:23 +07:00
  • 0d0edbe440 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-31 02:35:21 +00:00
  • 236dad3bb7 set 0.15.0.dev0 version (#3380) Wing Lian 2026-01-30 21:28:01 -05:00
  • bd14d98fd4 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-30 19:17:11 +00:00
  • be00978bc2 tag for v0.14.0 release (#3379) v0.14.0 Wing Lian 2026-01-30 14:10:27 -05:00
  • 8bd9ba2117 Built site for gh-pages Quarto GHA Workflow Runner 2026-01-29 19:34:20 +00:00
  • 3738978394 Add support for batched_mm, grouped_mm and scattermoe for MoE models (#3377) Wing Lian 2026-01-29 14:25:47 -05:00
  • 8e4394e14f Built site for gh-pages Quarto GHA Workflow Runner 2026-01-28 11:52:08 +00:00
  • 6132a30cda handle warnings from v5 upgrade (#3376) Wing Lian 2026-01-28 06:45:01 -05:00
  • 3dd86d35b8 feat: add new cce support for glm series and exaone4 (#3373) [skip ci] NanoCode012 2026-01-28 18:44:44 +07:00
  • dd9ebaeba1 EAFT (#3366) [skip ci] salman 2026-01-28 11:44:15 +00:00
  • 4934c2f06a Built site for gh-pages Quarto GHA Workflow Runner 2026-01-27 22:15:35 +00:00
  • fc4e37920b transformers v5 upgrade (#3272) Wing Lian 2026-01-27 17:08:24 -05:00