Commit Graph

  • 33d094721c fix: deepcopy lr in RexLR scheduler. (#3012) Carsten Kragelund Jørgensen 2025-08-04 16:23:49 +02:00
  • a54c1be972 Fix: shorten mem logs to 2 decimal places and renamed nd docs (#3011) [skip ci] NanoCode012 2025-08-04 21:23:36 +07:00
  • 5691992d34 chore: update pre-commit hooks (#3009) [skip ci] github-actions[bot] 2025-08-04 10:23:19 -04:00
  • 61172b9889 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-04 00:11:07 +00:00
  • e758343cac FSDP2 + LoRA kernels (#2992) Dan Saunders 2025-08-03 20:05:17 -04:00
  • b5198d8734 granite chat multipack support and example chat-template-granite Wing Lian 2025-08-02 20:57:00 -04:00
  • 6164aaea95 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-03 00:29:45 +00:00
  • deac7b18a1 upgrade peft v0.17.0 and support for lora target_parameters (#3006) Wing Lian 2025-08-02 20:24:04 -04:00
  • 4ab6a1bd7e add support for granite chat templates Wing Lian 2025-08-02 11:29:03 -04:00
  • e221eb555e Built site for gh-pages Quarto GHA Workflow Runner 2025-08-02 15:24:54 +00:00
  • 10946afae7 fixes for spinning up vllm service for grpo (#3001) Wing Lian 2025-08-02 11:19:24 -04:00
  • 1b7af36546 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 21:59:26 +00:00
  • 5639552064 prevent usage of low bit ao optimizers with configurations that use parameter groups (#3003) Wing Lian 2025-08-01 17:54:04 -04:00
  • 85377da233 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 20:16:20 +00:00
  • cda3c82351 move ib/rdma libs into base image (#3002) Wing Lian 2025-08-01 16:10:37 -04:00
  • fff4876011 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 18:03:56 +00:00
  • 7c3b428f23 Add validation for TP with models with tied embeddings (#2999) Wing Lian 2025-08-01 13:58:16 -04:00
  • 3bc270eb97 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 17:27:24 +00:00
  • 01a6bd1a0e use CCE fix for TP using vocab parallel for CEL (#3000) Wing Lian 2025-08-01 13:21:58 -04:00
  • 41709822a7 fix: move memory usage log to trainer.log (#2996) [skip ci] NanoCode012 2025-08-02 00:21:43 +07:00
  • 7b7f12e34d Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 14:05:02 +00:00
  • 02a37199ee prevent empty value for vllm_mode (#2998) Wing Lian 2025-08-01 09:59:45 -04:00
  • b2accbbb42 Built site for gh-pages Quarto GHA Workflow Runner 2025-08-01 06:24:01 +00:00
  • 7026cd5e9e Feat: Add N-D parallelism docs (#2989) NanoCode012 2025-08-01 13:18:31 +07:00
  • e92742cb0b Built site for gh-pages Quarto GHA Workflow Runner 2025-07-31 22:24:22 +00:00
  • eb0a8a7775 feat: upgrade cce commit to include smollm3, granite, granitemoe (#2993) NanoCode012 2025-08-01 05:18:44 +07:00
  • 39c92de913 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-31 19:30:34 +00:00
  • 294c7fe7a6 Distributed/ND-Parallel (#2977) salman 2025-07-31 20:25:02 +01:00
  • 7b68dfafd7 jagged lr restart scheudler (#1680) [skip ci] Wing Lian 2025-07-31 13:50:03 -04:00
  • 32a7890231 Revert test update to index.qmd (#2995) [skip ci] salman 2025-07-31 16:46:31 +01:00
  • 85d9d0f152 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-31 15:22:56 +00:00
  • 563f5eed7a update dependencies - liger + trl (#2987) Wing Lian 2025-07-31 11:17:17 -04:00
  • 6ec282094d actually call the register method on plugins (#2991) [skip ci] Wing Lian 2025-07-31 11:13:15 -04:00
  • 09dda462ab Fix don't preview docs for contributors (#2994) [skip ci] salman 2025-07-31 16:12:41 +01:00
  • ca0e437362 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-30 19:52:35 +00:00
  • bb1cae1a20 CLI: add --launcher option, support launcher args, cleanup, refactor (#2924) Dan Saunders 2025-07-30 15:46:56 -04:00
  • 08aa74e418 fix llama modeling custom-modeling Wing Lian 2025-07-30 11:37:58 -04:00
  • dfa14f87ab fix residuals and add llama support Wing Lian 2025-07-30 10:22:38 -04:00
  • fbe1b504da add custom modeling for gemma3 using liger fused add rms Wing Lian 2025-07-30 08:21:03 -04:00
  • 5b8370969c actually call the register method on plugins Wing Lian 2025-07-30 08:05:25 -04:00
  • b1bf58e8e6 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-30 10:55:07 +00:00
  • 22810c97b7 use warmup_ratio as a better default than warmup steps since it's data dependent (#2897) [skip ci] Wing Lian 2025-07-30 06:44:06 -04:00
  • 2eb7ff95af Use '<|finetune_right_pad|>' as padding token for LLama4 (#2988) [skip ci] Vincenzo di Cicco 2025-07-30 12:38:13 +02:00
  • 1e5fcd846d Built site for gh-pages Quarto GHA Workflow Runner 2025-07-30 10:34:25 +00:00
  • e45e19f21a Built site for gh-pages Quarto GHA Workflow Runner 2025-07-30 10:26:43 +00:00
  • 1a2b198299 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-30 09:02:40 +00:00
  • 90e5598930 Feat: Add voxtral, magistral small 1.1, and misc gemma3n fixes (#2979) NanoCode012 2025-07-30 15:57:05 +07:00
  • ede973b76c nits lora_kernels_fsdp Dan Saunders 2025-07-28 01:47:40 +00:00
  • 91590f1560 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-27 21:10:48 +00:00
  • 1d2aa1e467 upgrade to support latest transformers release (#2984) Wing Lian 2025-07-27 17:05:12 -04:00
  • 430be216d8 add shuffle_before_merging_datasets option to allow independent shuffling of datasets before merging (#2981) [skip ci] NICOLAS BZRD 2025-07-27 23:04:56 +02:00
  • 28804b82e4 don't create a reference model if grpo beta is 0.0 (#2983) [skip ci] Wing Lian 2025-07-27 17:04:42 -04:00
  • add3e5076b don't publish to netlify on contributor submissions since it requires auth tokens (#2985) [skip ci] Wing Lian 2025-07-27 17:04:27 -04:00
  • 41434f0c28 feat(doc): add all providers to readme (#2972) [skip ci] NanoCode012 2025-07-28 04:03:50 +07:00
  • d58a3505ba Built site for gh-pages Quarto GHA Workflow Runner 2025-07-25 11:21:16 +00:00
  • f7ea140838 TiledMLP support for FSDP2 (#2950) Wing Lian 2025-07-25 07:15:03 -04:00
  • 7994fb415c Built site for gh-pages Quarto GHA Workflow Runner 2025-07-24 20:16:24 +00:00
  • 460e0f9ed9 improve handling of file lock when content is empty (#2959) Wing Lian 2025-07-24 16:10:38 -04:00
  • e80faea0db garbage collect on the end of the step if we're going to save a checkpoint (#2971) [skip ci] Wing Lian 2025-07-24 16:10:23 -04:00
  • 0ff2f172ef Act offload lora fix (#2928) [skip ci] Wing Lian 2025-07-24 16:10:04 -04:00
  • e1e1523493 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-24 08:17:20 +00:00
  • e36d3c9f30 Merge branch 'main' into testingci testingci salman 2025-07-24 09:13:15 +01:00
  • 53614391ed wip Salman Mohammadi 2025-07-24 09:12:55 +01:00
  • 1407aac779 Skip CI for draft PRs (#2970) salman 2025-07-24 09:11:46 +01:00
  • bc2bc688d8 update fsdp2 patch nd_parallel Salman Mohammadi 2025-07-23 16:53:03 +01:00
  • ef35bf38f3 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-23 14:33:05 +00:00
  • b34c3371ed upgrade torchao (#2968) Dan Saunders 2025-07-23 10:27:28 -04:00
  • b3c04dd9fe workaround for fsdp2 optimizer save failures Wing Lian 2025-07-23 09:38:57 -04:00
  • 972c719d38 use latest transformers on main with fix Wing Lian 2025-07-23 09:22:36 -04:00
  • 2c1cb8b300 fix for accelerator state getting reset and missing schema Wing Lian 2025-07-23 08:43:34 -04:00
  • cca207eec4 handle none checks Wing Lian 2025-07-22 21:21:45 -04:00
  • 9a2da4d9f0 update tp validation Wing Lian 2025-07-22 21:20:57 -04:00
  • 8fe4758e94 make sure to return data for validation Wing Lian 2025-07-22 21:18:39 -04:00
  • 8c641fdcb4 handle tp load Wing Lian 2025-07-22 21:17:27 -04:00
  • 5c74bebfd0 use new upstream branches for nd-parallelism Wing Lian 2025-07-22 21:12:22 -04:00
  • 4c1a2b79df Built site for gh-pages Quarto GHA Workflow Runner 2025-07-23 00:45:36 +00:00
  • 5f1a4306b0 don't check dataset labels during preprocess for GRPO (#2952) [skip ci] Wing Lian 2025-07-22 20:40:44 -04:00
  • 93709eb5ce handle refactor upstream for flash attention (#2966) Wing Lian 2025-07-22 20:40:04 -04:00
  • 1a0ec998b7 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-22 20:33:12 +00:00
  • 208fb7b8e7 basic torchao fp8 mixed precision training (#2926) Dan Saunders 2025-07-22 16:27:47 -04:00
  • f1a2cfc441 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-22 14:06:04 +00:00
  • b86a1d47b0 we don't need to call check_dataset_labels when skip_prepare_dataset is set (#2962) Wing Lian 2025-07-22 10:00:53 -04:00
  • 01d8175d48 fix: revert changing default optimizer to muon (#2965) [skip ci] NanoCode012 2025-07-22 21:00:30 +07:00
  • 631268a0ca revert renaming of deepspeed stage3 args that use auto (#2964) [skip ci] NanoCode012 2025-07-22 20:59:47 +07:00
  • f6fa94309e Built site for gh-pages Quarto GHA Workflow Runner 2025-07-22 12:35:49 +00:00
  • 3a208cfd84 Autocomplete axolotl CLI (#2955) Wing Lian 2025-07-22 08:30:31 -04:00
  • 7267edc168 chore: update pre-commit hooks (#2954) [skip ci] github-actions[bot] 2025-07-22 08:30:00 -04:00
  • ab2b3240ba Built site for gh-pages Quarto GHA Workflow Runner 2025-07-22 09:57:20 +00:00
  • dfba881e99 Feat: add gemma3n support (#2852) NanoCode012 2025-07-22 16:52:15 +07:00
  • d32058e149 include torchvision in build for upstream changes requiring it now (#2953) [skip ci] Wing Lian 2025-07-22 04:19:16 -04:00
  • 28638e2aef Built site for gh-pages Quarto GHA Workflow Runner 2025-07-21 15:47:34 +00:00
  • bc1076d8a2 fix: suppress warning if we enabled skip prepare (#2958) NanoCode012 2025-07-21 22:42:04 +07:00
  • b7e8f66e5a upstream fixes in cce for dora and tensor paralel support (#2960) [skip ci] Wing Lian 2025-07-21 11:41:53 -04:00
  • e207762928 fix deprecate deepspeed stage3_gather_16bit_weights_on_model_save arg (#2956) [skip ci] Wing Lian 2025-07-21 11:41:31 -04:00
  • fefb0797ee better handling for reward function checks for GRPO (#2933) [skip ci] Wing Lian 2025-07-21 11:41:15 -04:00
  • af8d257aa2 make pad_to_sequence_len default to the same value as sample_packing (#2941) [skip ci] Wing Lian 2025-07-21 11:40:56 -04:00
  • db5f6f4693 limit num_proc when saving datasets to disk (#2948) [skip ci] Wing Lian 2025-07-21 11:39:38 -04:00
  • a54de13d48 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-21 15:10:29 +00:00
  • 8e5f146701 Fix cloud docker image build and remove apt files for optim (#2961) Wing Lian 2025-07-21 11:05:00 -04:00
  • 6fed3f3b46 Built site for gh-pages Quarto GHA Workflow Runner 2025-07-21 01:25:32 +00:00