Wing Lian
|
464de78f6d
|
regroup attn_implementation tests by feature concern
|
2026-04-25 08:59:13 +00:00 |
|
Wing Lian
|
7a41b47d22
|
drop "Phase 2" naming from attn-implementation tests
|
2026-04-25 08:59:05 +00:00 |
|
Wing Lian
|
6886def92c
|
fix duplicate attn_implementation in gpt-oss yamls and flaky caplog tests
|
2026-04-25 08:58:53 +00:00 |
|
Wing Lian
|
434a484fe9
|
update doc snippets + reject gemma4-hybrid with non-FA2 backend
|
2026-04-23 22:27:01 +00:00 |
|
Wing Lian
|
2d64d009d8
|
expand attention tests + rewrite docs
|
2026-04-23 22:27:01 +00:00 |
|
Wing Lian
|
2579c496d5
|
make attn_implementation the single source of truth
|
2026-04-23 22:27:01 +00:00 |
|
Wing Lian
|
ff5d6393c8
|
replace legacy attention boolean flags with capability properties
Replace checks with capability-based properties derived from attn_implementation
This separates three concerns that were conflated under flash_attention:
1. Backend selection -> attn_implementation enum
2. Packing capability -> attn_supports_packing property
3. Flash-attn library dependency -> attn_uses_flash_lib property
|
2026-04-23 22:27:01 +00:00 |
|
Wing Lian
|
aee8c75d64
|
refactor attention handling
|
2026-04-23 22:27:01 +00:00 |
|