Commit Graph

  • f5f5a3ee9b feat(doc): add llama4 to liger support fix/doc-key NanoCode012 2025-04-09 15:41:05 +07:00
  • cc512a57a5 fix: wrong key used in example doc NanoCode012 2025-04-09 14:54:21 +07:00
  • 1180757295 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-09 06:55:50 +00:00
  • f85861a0b2 fix: liger swiglu for llama4 (#2504) NanoCode012 2025-04-09 13:53:17 +07:00
  • 630e40dd13 upgrade transformers to 4.51.1 (#2508) Wing Lian 2025-04-09 02:53:00 -04:00
  • bf9efe2a09 [llama4] fix the mm yaml, add scout single gpu yaml (#2510) Wing Lian 2025-04-09 02:52:45 -04:00
  • 4581d6a8de fix: accidentally reassigning tensor to weight fix/cce-linear NanoCode012 2025-04-09 13:45:29 +07:00
  • 46afcf070f rename to specify fsdp llama-4-examples Wing Lian 2025-04-09 02:39:03 -04:00
  • 3036ca349f add README for llama4 Wing Lian 2025-04-09 02:15:09 -04:00
  • 37a66e6866 multigpu longer timeout transformers-4511 Wing Lian 2025-04-09 01:54:35 -04:00
  • dc4809f7dd [llama4] fix the mm yaml, add scout single gpu yaml Wing Lian 2025-04-09 01:52:31 -04:00
  • 9f69597a5f upgrade transformers to 4.51.1 Wing Lian 2025-04-09 00:20:50 -04:00
  • c36ff6ab70 Create CNAME Wing Lian 2025-04-08 13:55:41 -04:00
  • 2f147cc6ff fixing tests Salman Mohammadi 2025-04-08 17:23:21 +01:00
  • 6f47b1e896 merging Salman Mohammadi 2025-04-08 17:20:53 +01:00
  • e1a8dfbe8c pinning transformers version Salman Mohammadi 2025-04-08 17:17:23 +01:00
  • 19f90ba9dc feat: add deepseekv3 liger ref code feat/liger-deepseekv3 NanoCode012 2025-04-08 21:25:19 +07:00
  • cdb16069af fixing transformers version salman 2025-04-08 11:28:52 +01:00
  • 75c565d476 add back dynamic=False Sunny Liu 2025-04-07 17:06:51 -04:00
  • bdaaba2784 remove backend='inductor' in local patch Sunny Liu 2025-04-07 17:05:08 -04:00
  • 04624c5a8d bump flex patching transformers to v4.51, update torch compile kwargs to be in line with transformers v4.51 Sunny Liu 2025-04-07 15:12:45 -04:00
  • 1a85fab2ca fix: lm_head is a view or related view modified NanoCode012 2025-04-08 17:32:28 +07:00
  • b98dbafc31 fixing transformers version salman 2025-04-08 11:28:52 +01:00
  • ebe5abad53 0.8.1 version v0.8.1 release-0.8.x Wing Lian 2025-04-07 20:49:40 -04:00
  • 0dac2ddeac Llama4 linearized (#2502) Wing Lian 2025-04-07 20:47:00 -04:00
  • ed34ee51b4 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-07 21:14:41 +00:00
  • a6c03217f5 feat: add llama4 CCE (#2498) NanoCode012 2025-04-08 04:12:28 +07:00
  • 4d320e2e4d add back dynamic=False Sunny Liu 2025-04-07 17:06:51 -04:00
  • 421e0ee499 remove backend='inductor' in local patch Sunny Liu 2025-04-07 17:05:08 -04:00
  • 4e8677027a bump flex patching transformers to v4.51, update torch compile kwargs to be in line with transformers v4.51 Sunny Liu 2025-04-07 15:12:45 -04:00
  • 00364ad07a Built site for gh-pages Quarto GHA Workflow Runner 2025-04-07 18:50:15 +00:00
  • 59cd472504 SP cu_seqlens fix, refactor (#2495) Dan Saunders 2025-04-07 14:47:57 -04:00
  • 954b989e88 log warning re: logged losses / gradient scaling per rank sp-fix-masking Dan Saunders 2025-04-07 18:46:58 +00:00
  • c64c881460 using existing packed seqlens util Dan Saunders 2025-04-06 18:35:31 +00:00
  • cefd57cecb adding smoke test Dan Saunders 2025-04-06 01:48:27 +00:00
  • 2f3c52ea2f pre-commit fix Dan Saunders 2025-04-06 00:36:27 +00:00
  • 741015b3cf refactor and fix multipack seqlens Dan Saunders 2025-04-06 00:31:19 +00:00
  • 4188700b7b working on masking fix Dan Saunders 2025-04-04 20:24:18 +00:00
  • 4be68e03ec Built site for gh-pages Quarto GHA Workflow Runner 2025-04-07 16:43:41 +00:00
  • 69e5f60891 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-07 16:41:40 +00:00
  • 9b89591ead Feat: Add doc on loading datasets and support for Azure/OCI (#2482) NanoCode012 2025-04-07 23:41:13 +07:00
  • 31498d0230 fix(doc): clarify roles mapping in chat_template (#2490) [skip ci] NanoCode012 2025-04-07 23:40:32 +07:00
  • d25daebea9 fix: duplicate llama4 chattemplate enum (#2500) NanoCode012 2025-04-07 23:39:19 +07:00
  • fbc49a1763 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-07 14:51:53 +00:00
  • e0e5d9b1d6 feat: add llama4 multimodal (#2499) NanoCode012 2025-04-07 21:49:29 +07:00
  • 8bbad21bfd llama4 support (#2493) Wing Lian 2025-04-07 10:49:15 -04:00
  • 1aec93cf9e add preliminary fp8 support llama4-patches llama4 Wing Lian 2025-04-06 23:54:50 -04:00
  • 37630fc6ef patches to make llama4 performant Wing Lian 2025-04-06 22:50:48 -04:00
  • 4b28b2a0b4 remove stray print, add llama4 chat template to schema, bump peft to 0.15.1 Wing Lian 2025-04-06 19:59:48 -04:00
  • 320553850a update peft to 0.15.1 peft-update Wing Lian 2025-04-06 19:55:07 -04:00
  • b38f70e068 use 4.51.0 for now Wing Lian 2025-04-06 18:14:14 -04:00
  • cf4c84e21d slightly smaller train set Wing Lian 2025-04-06 17:08:39 -04:00
  • 98d98ea1dd reordering to trigger torch 2.6.0 tests first Wing Lian 2025-04-06 16:05:26 -04:00
  • 0cf42ab8a3 don't use deepspeed for the fix_untrained_tokens test Wing Lian 2025-04-06 07:42:11 -04:00
  • 3d0ab75a0c be flexible on transformers version and skip test on version Wing Lian 2025-04-06 02:33:40 -04:00
  • bc4df742e4 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-06 21:10:14 +00:00
  • d375be90ff add xet support [skip ci] Wing Lian 2025-04-05 22:37:52 -04:00
  • 98827e8f3b llama4 support Wing Lian 2025-04-05 17:47:16 -04:00
  • 5f4af3665d FSDP2 support (#2469) Wing Lian 2025-04-06 17:08:01 -04:00
  • c7f1c191a3 additional validation for fsdp2, bump dep versions fsdp2 Wing Lian 2025-04-06 15:18:56 -04:00
  • 1a5d445413 make sure to patch all the loaded models Wing Lian 2025-04-06 14:45:30 -04:00
  • 7e410ab480 more fixes to flex for fsdp2 Wing Lian 2025-04-06 14:24:50 -04:00
  • b5a51c378b okay, actually use fdsp2... Wing Lian 2025-04-06 13:55:46 -04:00
  • 9509abccdd use yet-another-deepspeed branch from transformers#37324 llama-4-z3 Wing Lian 2025-04-06 13:21:34 -04:00
  • 3acefba9ba point to branch for potential zero3 fix Wing Lian 2025-04-05 20:33:07 -04:00
  • 100e5ea6ea llama4 support Wing Lian 2025-04-05 17:47:16 -04:00
  • c902f4222d make sure both flex and flash attn work with fsdp2, skip fix untrained tokens Wing Lian 2025-04-06 12:30:14 -04:00
  • 9329db9c3a fix fsdp2 config for ci Wing Lian 2025-04-06 07:55:54 -04:00
  • ad7293f617 skip zero3 tests for this PR for now Wing Lian 2025-04-05 01:30:53 -04:00
  • 475125e4ca use transformers commit with fsdp2 support Wing Lian 2025-04-04 14:23:31 -04:00
  • 2b5e546da0 add fsdp2 e2e tests Wing Lian 2025-04-01 19:00:21 -04:00
  • 252dc5c91b liger + torch compile fix Wing Lian 2025-04-01 15:32:27 -04:00
  • af3f981f51 allow 8bit optims with fsdp2 Wing Lian 2025-04-01 14:55:59 -04:00
  • 52b96031b4 use accelerate release 1.6.0 Wing Lian 2025-04-01 14:13:49 -04:00
  • 03dcf1a5ea fsdp2 support Wing Lian 2025-03-22 21:47:54 -04:00
  • 5a11ba3ecc Built site for gh-pages Quarto GHA Workflow Runner 2025-04-05 22:05:16 +00:00
  • a8f38c367c Flex Attention + Packing with BlockMask support (#2363) Sung Ching Liu 2025-04-05 18:02:57 -04:00
  • f1a5879a51 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-05 21:43:47 +00:00
  • e7e0cd97ce Update dependencies and show slow tests in CI (#2492) Wing Lian 2025-04-05 17:41:31 -04:00
  • bedd4a8554 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-05 05:27:55 +00:00
  • 949471039f fix tokenizer overrides w gemma3 (#2488) Wing Lian 2025-04-05 01:25:44 -04:00
  • cc5cf77f6b Built site for gh-pages Quarto GHA Workflow Runner 2025-04-04 17:50:00 +00:00
  • de451f99a5 fix: cohere cce scaling wrong tensor (#2483) NanoCode012 2025-04-05 00:47:44 +07:00
  • 9f824ef76a simplify the example configs to be more minimal and less daunting (#2486) [skip ci] Wing Lian 2025-04-04 13:47:26 -04:00
  • dd66fb163c check if fixture exists in the cache already (#2485) Wing Lian 2025-04-04 13:47:01 -04:00
  • 9f30d3d33a reworking SP logic into composed handler sp-rl Dan Saunders 2025-04-04 02:25:00 +00:00
  • 595c2b4288 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-03 18:53:11 +00:00
  • e0cc4f1a87 removing deepspeed guard for LoRA Triton kernels (#2480) Dan Saunders 2025-04-03 14:50:56 -04:00
  • 700409be6f removing deepspeed guard for LoRA Triton kernels lora-kernels-deepspeed Dan Saunders 2025-04-03 16:44:45 +00:00
  • 4a53765808 Built site for gh-pages Quarto GHA Workflow Runner 2025-04-03 12:50:35 +00:00
  • 64d8035f50 fix(example): align example to correct adapter (#2478) NanoCode012 2025-04-03 19:48:14 +07:00
  • 5249e98058 add additional tf32 opt for cudnn (#2477) [skip ci] Wing Lian 2025-04-03 08:47:52 -04:00
  • 2d50c2656f Built site for gh-pages Quarto GHA Workflow Runner 2025-04-02 13:53:12 +00:00
  • 3877c5c69d set release version 0.8.0 (#2476) v0.8.0 Wing Lian 2025-04-02 09:50:56 -04:00
  • 41474090da Built site for gh-pages Quarto GHA Workflow Runner 2025-04-02 13:37:55 +00:00
  • adb593abac fix: document offload gradient_checkpointing option (#2475) NanoCode012 2025-04-02 20:35:42 +07:00
  • a0117c9bce fix: separate gemma3 text and vision example config (#2471) [skip ci] NanoCode012 2025-04-02 20:35:29 +07:00
  • e6cfb093d2 fix: disable SP during merge (#2470) [skip ci] NanoCode012 2025-04-02 20:35:00 +07:00
  • 7abc71dc0b fix: gemma3 loss in forward pass (#2473) [skip ci] NanoCode012 2025-04-02 20:34:41 +07:00
  • 45bf634d17 feat: add support for multimodal in lora kernels (#2472) [skip ci] NanoCode012 2025-04-02 20:33:46 +07:00