This website requires JavaScript.
f5f5a3ee9b
feat(doc): add llama4 to liger support
fix/doc-key
NanoCode012
2025-04-09 15:41:05 +07:00
cc512a57a5
fix: wrong key used in example doc
NanoCode012
2025-04-09 14:54:21 +07:00
1180757295
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-09 06:55:50 +00:00
f85861a0b2
fix: liger swiglu for llama4 (#2504 )
NanoCode012
2025-04-09 13:53:17 +07:00
630e40dd13
upgrade transformers to 4.51.1 (#2508 )
Wing Lian
2025-04-09 02:53:00 -04:00
bf9efe2a09
[llama4] fix the mm yaml, add scout single gpu yaml (#2510 )
Wing Lian
2025-04-09 02:52:45 -04:00
4581d6a8de
fix: accidentally reassigning tensor to weight
fix/cce-linear
NanoCode012
2025-04-09 13:45:29 +07:00
46afcf070f
rename to specify fsdp
llama-4-examples
Wing Lian
2025-04-09 02:39:03 -04:00
3036ca349f
add README for llama4
Wing Lian
2025-04-09 02:15:09 -04:00
37a66e6866
multigpu longer timeout
transformers-4511
Wing Lian
2025-04-09 01:54:35 -04:00
dc4809f7dd
[llama4] fix the mm yaml, add scout single gpu yaml
Wing Lian
2025-04-09 01:52:31 -04:00
9f69597a5f
upgrade transformers to 4.51.1
Wing Lian
2025-04-09 00:20:50 -04:00
c36ff6ab70
Create CNAME
Wing Lian
2025-04-08 13:55:41 -04:00
2f147cc6ff
fixing tests
Salman Mohammadi
2025-04-08 17:23:21 +01:00
6f47b1e896
merging
Salman Mohammadi
2025-04-08 17:20:53 +01:00
e1a8dfbe8c
pinning transformers version
Salman Mohammadi
2025-04-08 17:17:23 +01:00
19f90ba9dc
feat: add deepseekv3 liger ref code
feat/liger-deepseekv3
NanoCode012
2025-04-08 21:25:19 +07:00
cdb16069af
fixing transformers version
salman
2025-04-08 11:28:52 +01:00
75c565d476
add back dynamic=False
Sunny Liu
2025-04-07 17:06:51 -04:00
bdaaba2784
remove backend='inductor' in local patch
Sunny Liu
2025-04-07 17:05:08 -04:00
04624c5a8d
bump flex patching transformers to v4.51, update torch compile kwargs to be in line with transformers v4.51
Sunny Liu
2025-04-07 15:12:45 -04:00
1a85fab2ca
fix: lm_head is a view or related view modified
NanoCode012
2025-04-08 17:32:28 +07:00
b98dbafc31
fixing transformers version
salman
2025-04-08 11:28:52 +01:00
ebe5abad53
0.8.1 version
v0.8.1
release-0.8.x
Wing Lian
2025-04-07 20:49:40 -04:00
0dac2ddeac
Llama4 linearized (#2502 )
Wing Lian
2025-04-07 20:47:00 -04:00
ed34ee51b4
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-07 21:14:41 +00:00
a6c03217f5
feat: add llama4 CCE (#2498 )
NanoCode012
2025-04-08 04:12:28 +07:00
4d320e2e4d
add back dynamic=False
Sunny Liu
2025-04-07 17:06:51 -04:00
421e0ee499
remove backend='inductor' in local patch
Sunny Liu
2025-04-07 17:05:08 -04:00
4e8677027a
bump flex patching transformers to v4.51, update torch compile kwargs to be in line with transformers v4.51
Sunny Liu
2025-04-07 15:12:45 -04:00
00364ad07a
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-07 18:50:15 +00:00
59cd472504
SP cu_seqlens fix, refactor (#2495 )
Dan Saunders
2025-04-07 14:47:57 -04:00
954b989e88
log warning re: logged losses / gradient scaling per rank
sp-fix-masking
Dan Saunders
2025-04-07 18:46:58 +00:00
c64c881460
using existing packed seqlens util
Dan Saunders
2025-04-06 18:35:31 +00:00
cefd57cecb
adding smoke test
Dan Saunders
2025-04-06 01:48:27 +00:00
2f3c52ea2f
pre-commit fix
Dan Saunders
2025-04-06 00:36:27 +00:00
741015b3cf
refactor and fix multipack seqlens
Dan Saunders
2025-04-06 00:31:19 +00:00
4188700b7b
working on masking fix
Dan Saunders
2025-04-04 20:24:18 +00:00
4be68e03ec
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-07 16:43:41 +00:00
69e5f60891
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-07 16:41:40 +00:00
9b89591ead
Feat: Add doc on loading datasets and support for Azure/OCI (#2482 )
NanoCode012
2025-04-07 23:41:13 +07:00
31498d0230
fix(doc): clarify roles mapping in chat_template (#2490 ) [skip ci]
NanoCode012
2025-04-07 23:40:32 +07:00
d25daebea9
fix: duplicate llama4 chattemplate enum (#2500 )
NanoCode012
2025-04-07 23:39:19 +07:00
fbc49a1763
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-07 14:51:53 +00:00
e0e5d9b1d6
feat: add llama4 multimodal (#2499 )
NanoCode012
2025-04-07 21:49:29 +07:00
8bbad21bfd
llama4 support (#2493 )
Wing Lian
2025-04-07 10:49:15 -04:00
1aec93cf9e
add preliminary fp8 support
llama4-patches
llama4
Wing Lian
2025-04-06 23:54:50 -04:00
37630fc6ef
patches to make llama4 performant
Wing Lian
2025-04-06 22:50:48 -04:00
4b28b2a0b4
remove stray print, add llama4 chat template to schema, bump peft to 0.15.1
Wing Lian
2025-04-06 19:59:48 -04:00
320553850a
update peft to 0.15.1
peft-update
Wing Lian
2025-04-06 19:55:07 -04:00
b38f70e068
use 4.51.0 for now
Wing Lian
2025-04-06 18:14:14 -04:00
cf4c84e21d
slightly smaller train set
Wing Lian
2025-04-06 17:08:39 -04:00
98d98ea1dd
reordering to trigger torch 2.6.0 tests first
Wing Lian
2025-04-06 16:05:26 -04:00
0cf42ab8a3
don't use deepspeed for the fix_untrained_tokens test
Wing Lian
2025-04-06 07:42:11 -04:00
3d0ab75a0c
be flexible on transformers version and skip test on version
Wing Lian
2025-04-06 02:33:40 -04:00
bc4df742e4
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-06 21:10:14 +00:00
d375be90ff
add xet support [skip ci]
Wing Lian
2025-04-05 22:37:52 -04:00
98827e8f3b
llama4 support
Wing Lian
2025-04-05 17:47:16 -04:00
5f4af3665d
FSDP2 support (#2469 )
Wing Lian
2025-04-06 17:08:01 -04:00
c7f1c191a3
additional validation for fsdp2, bump dep versions
fsdp2
Wing Lian
2025-04-06 15:18:56 -04:00
1a5d445413
make sure to patch all the loaded models
Wing Lian
2025-04-06 14:45:30 -04:00
7e410ab480
more fixes to flex for fsdp2
Wing Lian
2025-04-06 14:24:50 -04:00
b5a51c378b
okay, actually use fdsp2...
Wing Lian
2025-04-06 13:55:46 -04:00
9509abccdd
use yet-another-deepspeed branch from transformers#37324
llama-4-z3
Wing Lian
2025-04-06 13:21:34 -04:00
3acefba9ba
point to branch for potential zero3 fix
Wing Lian
2025-04-05 20:33:07 -04:00
100e5ea6ea
llama4 support
Wing Lian
2025-04-05 17:47:16 -04:00
c902f4222d
make sure both flex and flash attn work with fsdp2, skip fix untrained tokens
Wing Lian
2025-04-06 12:30:14 -04:00
9329db9c3a
fix fsdp2 config for ci
Wing Lian
2025-04-06 07:55:54 -04:00
ad7293f617
skip zero3 tests for this PR for now
Wing Lian
2025-04-05 01:30:53 -04:00
475125e4ca
use transformers commit with fsdp2 support
Wing Lian
2025-04-04 14:23:31 -04:00
2b5e546da0
add fsdp2 e2e tests
Wing Lian
2025-04-01 19:00:21 -04:00
252dc5c91b
liger + torch compile fix
Wing Lian
2025-04-01 15:32:27 -04:00
af3f981f51
allow 8bit optims with fsdp2
Wing Lian
2025-04-01 14:55:59 -04:00
52b96031b4
use accelerate release 1.6.0
Wing Lian
2025-04-01 14:13:49 -04:00
03dcf1a5ea
fsdp2 support
Wing Lian
2025-03-22 21:47:54 -04:00
5a11ba3ecc
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-05 22:05:16 +00:00
a8f38c367c
Flex Attention + Packing with BlockMask support (#2363 )
Sung Ching Liu
2025-04-05 18:02:57 -04:00
f1a5879a51
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-05 21:43:47 +00:00
e7e0cd97ce
Update dependencies and show slow tests in CI (#2492 )
Wing Lian
2025-04-05 17:41:31 -04:00
bedd4a8554
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-05 05:27:55 +00:00
949471039f
fix tokenizer overrides w gemma3 (#2488 )
Wing Lian
2025-04-05 01:25:44 -04:00
cc5cf77f6b
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-04 17:50:00 +00:00
de451f99a5
fix: cohere cce scaling wrong tensor (#2483 )
NanoCode012
2025-04-05 00:47:44 +07:00
9f824ef76a
simplify the example configs to be more minimal and less daunting (#2486 ) [skip ci]
Wing Lian
2025-04-04 13:47:26 -04:00
dd66fb163c
check if fixture exists in the cache already (#2485 )
Wing Lian
2025-04-04 13:47:01 -04:00
9f30d3d33a
reworking SP logic into composed handler
sp-rl
Dan Saunders
2025-04-04 02:25:00 +00:00
595c2b4288
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-03 18:53:11 +00:00
e0cc4f1a87
removing deepspeed guard for LoRA Triton kernels (#2480 )
Dan Saunders
2025-04-03 14:50:56 -04:00
700409be6f
removing deepspeed guard for LoRA Triton kernels
lora-kernels-deepspeed
Dan Saunders
2025-04-03 16:44:45 +00:00
4a53765808
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-03 12:50:35 +00:00
64d8035f50
fix(example): align example to correct adapter (#2478 )
NanoCode012
2025-04-03 19:48:14 +07:00
5249e98058
add additional tf32 opt for cudnn (#2477 ) [skip ci]
Wing Lian
2025-04-03 08:47:52 -04:00
2d50c2656f
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-02 13:53:12 +00:00
3877c5c69d
set release version 0.8.0 (#2476 )
v0.8.0
Wing Lian
2025-04-02 09:50:56 -04:00
41474090da
Built site for gh-pages
Quarto GHA Workflow Runner
2025-04-02 13:37:55 +00:00
adb593abac
fix: document offload gradient_checkpointing option (#2475 )
NanoCode012
2025-04-02 20:35:42 +07:00
a0117c9bce
fix: separate gemma3 text and vision example config (#2471 ) [skip ci]
NanoCode012
2025-04-02 20:35:29 +07:00
e6cfb093d2
fix: disable SP during merge (#2470 ) [skip ci]
NanoCode012
2025-04-02 20:35:00 +07:00
7abc71dc0b
fix: gemma3 loss in forward pass (#2473 ) [skip ci]
NanoCode012
2025-04-02 20:34:41 +07:00
45bf634d17
feat: add support for multimodal in lora kernels (#2472 ) [skip ci]
NanoCode012
2025-04-02 20:33:46 +07:00