Commit Graph

  • 31a15a49b6 add additional packages via apt for better multi-node support (#2949) Wing Lian 2025-07-20 21:19:23 -04:00
  • 16db082bbe Built site for gh-pages Quarto GHA Workflow Runner 2025-07-19 18:00:00 +00:00
  • b986f7c7cb fix: return proper attention for llama4 lora kernel and fsdp2 llama4 example fix (#2943) NanoCode012 2025-07-20 00:54:43 +07:00
  • e5734e5cf0 adding torchtitan link (#2945) [skip ci] salman 2025-07-19 18:54:14 +01:00
  • 109d9c7442 make the initial call to tokenizer.pad not spam the console (#2946) [skip ci] Wing Lian 2025-07-19 13:53:35 -04:00
  • 5a51852af1 set torchao quant config on config.json of saved model quantize-ptq-cli Wing Lian 2025-07-17 16:46:25 -04:00
  • a1810990cc Built site for gh-pages Quarto GHA Workflow Runner 2025-07-17 19:38:09 +00:00
  • 170322a1f0 make sure log level is upper (#2934) Wing Lian 2025-07-17 15:32:55 -04:00
  • 5f5ae76213 add validation around cce + chunked_ce (#2932) [skip ci] Wing Lian 2025-07-17 15:32:38 -04:00
  • a798975b7c coderabbit manual settings (#2940) [skip ci] Wing Lian 2025-07-17 15:32:16 -04:00
  • d23f972602 use state for wandb in callbacks (#2930) [skip ci] Wing Lian 2025-07-17 15:31:56 -04:00
  • 8e41317250 don't use include_tokens_per_second for GRPO (#2931) [skip ci] Wing Lian 2025-07-17 15:31:21 -04:00
  • 380921ee56 Update ModelLoader to set default vocab_size if not defined in model config, enhancing compatibility with tokenizer defaults. fix/granite-speech mhenrhcsen 2025-07-17 19:53:41 +02:00
  • 6e71819560 Update ModelLoader to set vocab_size for GraniteSpeechConfig if not defined, ensuring compatibility with tokenizer defaults. mhenrhcsen 2025-07-17 19:49:10 +02:00
  • ea234afa8a Enhance model loading logic to include support for GraniteSpeechConfig, allowing for the use of the specific model class for Granite Speech. mhenrhcsen 2025-07-17 19:45:23 +02:00
  • 22d6247578 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-17 13:53:10 +00:00
  • 9f2bb188a4 Improve Dataset Processing Multiprocessing, Sharding, and Qwen Tokenizer Bug Fix. (#2918) Varun Gumma 2025-07-17 19:17:58 +05:30
  • 9dde9e1b71 misc fixes 202507 (#2937) [skip ci] Wing Lian 2025-07-17 09:47:45 -04:00
  • f2474ef941 bump accelerate to 1.9.0 (#2936) [skip ci] Wing Lian 2025-07-17 09:46:43 -04:00
  • 1579b7f549 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-17 04:07:09 +00:00
  • 8a4bcacdb2 cu126-torch271 for cloud docker image should be tagged with main-latest (#2935) Wing Lian 2025-07-17 00:01:23 -04:00
  • d2c3d5a954 run nightly-vs-upstream-main on 2.7.1 and multi-gpu also (#2929) [skip ci] Wing Lian 2025-07-16 21:45:42 -04:00
  • 4dc75cc713 Merge branch 'main' into kwargs-refactor kwargs-refactor Wing Lian 2025-07-16 17:10:26 -04:00
  • 738adb2258 fixes mhenrhcsen 2025-07-16 21:50:56 +02:00
  • f40e8caa28 checks mhenrhcsen 2025-07-16 21:30:01 +02:00
  • f9bdf1fb44 checks mhenrhcsen 2025-07-16 21:23:25 +02:00
  • 2f670a5988 Fix: Update model loading logic to conditionally upcast based on lm_head presence for btlm models mhenrhcsen 2025-07-16 21:16:47 +02:00
  • 84ad69afad Fix: Ensure tie_weights method is called only if it exists in the model class mhenrhcsen 2025-07-16 21:03:49 +02:00
  • 35b33b46d2 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-16 16:04:41 +00:00
  • 36cbe13d18 activation offloading with cuda streams doesn't work with LoRA (#2927) Wing Lian 2025-07-16 11:59:20 -04:00
  • 4fd10ac2d1 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-16 02:45:49 +00:00
  • 2c408b5c5e Apply generic fused liger ce, cce, and tiledmlp for arbitrary models (#2908) Wing Lian 2025-07-15 22:40:41 -04:00
  • 942005f526 use modal==1.0.2 for nightlies and for cli (#2925) [skip ci] Wing Lian 2025-07-15 20:31:23 -04:00
  • ad5a260ec8 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-15 19:06:20 +00:00
  • 6f6d917a99 Revert "checkpoint model on first step callback (#2906)" revert-2906-checkpoint-on-step-1 Dan Saunders 2025-07-15 15:01:12 -04:00
  • 10ba1622f7 checkpoint model on first step callback (#2906) Dan Saunders 2025-07-15 15:00:48 -04:00
  • d320ef6199 fix for upstream refactor of KwargsForCausalLM (#2911) Wing Lian 2025-07-15 11:28:41 -04:00
  • 88d1430c33 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-15 02:38:46 +00:00
  • 354eaaf0d3 feat: add call method to mistral tokenizer wrapper (#2898) NanoCode012 2025-07-15 09:33:35 +07:00
  • a061446540 Fix: Prevents merging of tool arguments during preprocessing (#2909) greenhestu 2025-07-15 11:33:10 +09:00
  • 2e0d219cae Built site for gh-pages Quarto GHA Workflow Runner 2025-07-15 01:39:26 +00:00
  • cd079b5536 Tensor parallel w DeepSpeed AutoTP (#2574) Wing Lian 2025-07-14 21:33:48 -04:00
  • 9564d8f7c6 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-15 00:16:43 +00:00
  • 5cc16040a8 move the plugin post trainer create to the setup trainer (#2907) Wing Lian 2025-07-14 20:11:33 -04:00
  • 38359a8997 allow profiling in mid-training rather from the start (#2899) [skip ci] Wing Lian 2025-07-14 20:11:11 -04:00
  • 7dc3ac6cb3 update nightlies builds (#2921) [skip ci] Wing Lian 2025-07-14 20:10:43 -04:00
  • 99187cd208 Activation Offloading w CUDA Streams (#2900) [skip ci] Wing Lian 2025-07-14 20:10:20 -04:00
  • aa684122f1 upgrade peft==0.16.0 and datasets==4.0.0 (#2917) [skip ci] Wing Lian 2025-07-14 20:09:26 -04:00
  • 1659bb9f82 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-14 18:24:42 +00:00
  • ca4d4ef793 don't init distributed for deepspeed if preprocessing (#2920) Wing Lian 2025-07-14 14:19:19 -04:00
  • 419ae6f149 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-14 16:37:44 +00:00
  • 37edbe4999 Remove extra torch.compile call (#2904) Dan Saunders 2025-07-14 12:32:45 -04:00
  • e581c15d40 refactor dupes from merge/rebase (#2919) [skip ci] Wing Lian 2025-07-14 10:05:26 -04:00
  • 30e8c026d4 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-14 13:31:13 +00:00
  • af92151a7b FSDP2 fix validation and add tests (#2910) Wing Lian 2025-07-14 09:25:44 -04:00
  • 80dc4c261a fix xformers version for python 2.6 (#2916) [skip ci] Wing Lian 2025-07-14 09:24:29 -04:00
  • 7ccbbd8e77 upgrade liger to 0.6.0 (#2893) [skip ci] Wing Lian 2025-07-14 09:24:07 -04:00
  • 5081db7f8a upgrade trl==0.19.1 (#2892) [skip ci] Wing Lian 2025-07-14 09:23:42 -04:00
  • d5d5747ce1 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-14 11:56:41 +00:00
  • 41664c7c4c fix ddp for incorrect steps (#2915) torch_tensor_parallel Wing Lian 2025-07-14 07:51:16 -04:00
  • 6978f09760 pre-patch the mlp fused-mlp-ez Wing Lian 2025-07-13 23:01:49 -04:00
  • d41b3814d0 use new patch Wing Lian 2025-07-13 22:40:37 -04:00
  • 1649f91cd4 wip patch Wing Lian 2025-07-07 09:56:22 -04:00
  • 5a063f5c75 wip state dict compatible fused mlp Wing Lian 2025-07-06 14:32:29 -04:00
  • 9394983633 fix for upstream refactor of KwargsForCausalLM Wing Lian 2025-07-12 13:57:36 -04:00
  • a93a5e6d30 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-12 15:46:51 +00:00
  • 9a8073e73d Liquid Foundation Model 2 support (#2905) Wing Lian 2025-07-12 11:41:34 -04:00
  • 7fb8441e0e fix: customized dataset with simpo (#2894) [skip ci] Jiawei Liu 2025-07-12 10:40:30 -05:00
  • 4dc5910e1c feat(doc): re-add docker 2.7.0 tag back (#2902) [skip ci] NanoCode012 2025-07-12 22:40:01 +07:00
  • fb7bc9250d move unmaintained examples to archive (#2903) [skip ci] Wing Lian 2025-07-12 11:39:51 -04:00
  • 1dae0505ba Built site for gh-pages Quarto GHA Workflow Runner 2025-07-12 14:23:07 +00:00
  • d6e4a611e5 FSDP1 -> FSDP2 (#2760) salman 2025-07-12 15:18:01 +01:00
  • eb662557a7 Register Plugins in Ray Workers (#2901) [skip ci] Ed Sealing 2025-07-11 13:59:59 -07:00
  • 5efa2959d4 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-11 13:13:46 +00:00
  • 03b2a113fe Update doc preview workflow to use sticky comments (#2873) salman 2025-07-11 14:08:35 +01:00
  • 5c71a1d5d8 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-11 02:39:44 +00:00
  • 9b95a625ab feat: add devstral small 2507 (#2896) NanoCode012 2025-07-11 09:34:19 +07:00
  • a69c86074a Built site for gh-pages Quarto GHA Workflow Runner 2025-07-09 18:58:15 +00:00
  • c370d0795c [doc] Fix docs for text field mapping for completion datasets (#2890) Wing Lian 2025-07-09 14:52:44 -04:00
  • 3e13ea033e Built site for gh-pages Quarto GHA Workflow Runner 2025-07-09 16:53:45 +00:00
  • c620a218b8 tiled_mlp supports single gpu (#2891) v0.11.0.post1 release-v0.11.x Wing Lian 2025-07-09 12:48:22 -04:00
  • 76aeb16156 tiled_mlp supports single gpu (#2891) Wing Lian 2025-07-09 12:48:22 -04:00
  • 7c5ea0010f bump dev version (#2889) [skip ci] Wing Lian 2025-07-09 09:43:42 -04:00
  • f980cda286 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-09 13:28:02 +00:00
  • c6d69d5c1b release v0.11.0 (#2875) v0.11.0 Wing Lian 2025-07-09 09:22:35 -04:00
  • 2ad27bd14d Built site for gh-pages Quarto GHA Workflow Runner 2025-07-09 12:49:16 +00:00
  • 4ff96a2526 fix xformers version (#2888) Wing Lian 2025-07-09 08:43:40 -04:00
  • 89e99eaaa7 slowest durations (#2887) [skip ci] salman 2025-07-09 13:43:26 +01:00
  • 1eb59d754e Built site for gh-pages Quarto GHA Workflow Runner 2025-07-08 20:33:50 +00:00
  • 6ed501f6dc add 2.7.0 torch images back to support vlllm (#2885) Wing Lian 2025-07-08 16:28:14 -04:00
  • d47093fcdd fix: simplify fn same as sft and pass model to plugin fix/rl-trainer-arg NanoCode012 2025-07-08 22:29:56 +07:00
  • 8c6a6ea6eb Feat: add devstral model support (#2880) [skip ci] NanoCode012 2025-07-08 22:01:19 +07:00
  • 78bff4925e fix: set add_generation_prompt to False when apply chat template (#2859) [skip ci] NanoCode012 2025-07-08 22:00:44 +07:00
  • b237c8a3f3 chore: update cce commit to include gemma3n fixes (#2881) [skip ci] NanoCode012 2025-07-08 21:59:35 +07:00
  • fe81d52882 remove xformers changes update-vllm Dan Saunders 2025-07-08 14:43:23 +00:00
  • 1eaa4ed89d update other deps Dan Saunders 2025-07-08 14:29:36 +00:00
  • fe47392ed6 updating vllm to latest Dan Saunders 2025-07-08 14:24:22 +00:00
  • 1032e22650 Fix link in FSDP + QLoRA docs. (#2879) [skip ci] float-trip 2025-07-08 09:19:09 -04:00
  • d7f9f4e61f Built site for gh-pages Quarto GHA Workflow Runner 2025-07-07 21:10:36 +00:00
  • d68cc1e8ab densemixer plugin integration (#2868) Wing Lian 2025-07-07 17:05:19 -04:00