Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) #4888

Merged
Show file tree
Hide file tree
Changes from 250 commits
Commits
Show all changes
342 commits
Select commit Hold shift + click to select a range
2d7e081
Merge branch 'infra_enc_dec_block_manager' into infra_enc_dec_cross_a…
afeldman-nm May 24, 2024
845f040
Merge branch 'upstream-main' into infra_enc_dec_block_manager
afeldman-nm May 24, 2024
849e49c
Merge branch 'upstream-main' into infra_enc_dec_block_manager_reviews
afeldman-nm May 26, 2024
a80325d
return output of SequenceGroup constructor
afeldman-nm May 26, 2024
8b38776
capitalize constants
afeldman-nm May 26, 2024
f39c313
refactored swap-block-table functionality
afeldman-nm May 26, 2024
90b5a0e
Refactored block manager + enc dec + unsupported feature checks into …
afeldman-nm May 26, 2024
9ee2582
removed circular import
afeldman-nm May 26, 2024
5d0ac23
apparently isort has to run last?
afeldman-nm May 26, 2024
1bcc949
slight name change
afeldman-nm May 26, 2024
5ae5969
merge
afeldman-nm May 28, 2024
1bece71
wip merge
afeldman-nm May 28, 2024
1d882ca
fixed utils to correctly handle encoder/decoder unsupported scenarios
afeldman-nm May 28, 2024
dfd9469
formatting
afeldman-nm May 28, 2024
a2e5465
Merge branch 'infra_enc_dec_block_manager' into infra_enc_dec_cross_a…
afeldman-nm May 28, 2024
3c3687e
renamed xformers metadata is_cross_attn to is_encoder_decoder_attn
afeldman-nm May 28, 2024
6f07c77
wip getting tests to pass after merge
afeldman-nm May 28, 2024
64e71e1
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm May 28, 2024
481c646
passing tests; formatting
afeldman-nm May 28, 2024
9c8e19d
removed overprovisioning from make_block_tables_slot_mapping()
afeldman-nm May 28, 2024
ed17ee3
comments'
afeldman-nm May 28, 2024
d630aa8
clarified block table address formula
afeldman-nm May 28, 2024
b664806
wip changing cross attention flag
afeldman-nm May 29, 2024
611df43
yapf fix
afeldman-nm May 29, 2024
8ee49dd
yapf fix
afeldman-nm May 29, 2024
6f4b49e
Merge branch 'upstream-main' into infra_enc_dec_block_manager_reviews
afeldman-nm May 29, 2024
039c25e
upstream merge
afeldman-nm May 29, 2024
8e9ef5b
fix formatting issue
afeldman-nm May 29, 2024
2b59ddc
formatting
afeldman-nm May 29, 2024
471569f
Merge branch 'upstream-main' into infra_enc_dec_block_manager_reviews
afeldman-nm May 29, 2024
0ad9d6a
Merge branch 'upstream-main' into infra_enc_dec_block_manager_reviews
afeldman-nm May 29, 2024
6f9ab7d
Merge branch 'infra_enc_dec_block_manager' into infra_enc_dec_cross_a…
afeldman-nm May 29, 2024
19d1ca5
passing tests with new attention type enum
afeldman-nm May 29, 2024
700b6dc
formatting
afeldman-nm May 29, 2024
76c639a
wip encoder test
afeldman-nm May 29, 2024
882640e
first pass at encoder attention test
afeldman-nm May 29, 2024
584297e
wip encoder attention test
afeldman-nm May 29, 2024
1de7077
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm May 29, 2024
a89c7c6
encoder attention test passes!
afeldman-nm May 29, 2024
0bbd0db
formatting
afeldman-nm May 29, 2024
af998ca
encoder test arguments
afeldman-nm May 29, 2024
78c678a
type hints; formatting
afeldman-nm May 29, 2024
0b6b2e9
merge in review work
afeldman-nm May 29, 2024
641f431
typo
afeldman-nm May 29, 2024
c7f5490
changed helper function naming convention
afeldman-nm May 29, 2024
9c78f85
check we are not testing decode-phase/encoder attention
afeldman-nm May 29, 2024
bf93a9e
refactoring
afeldman-nm May 29, 2024
cd759f2
formatting
afeldman-nm May 29, 2024
1af3625
removing unnecessary check
afeldman-nm May 29, 2024
eb5cf0c
unit test for encoder/decoder+chunked prefill non-support; added atte…
afeldman-nm May 29, 2024
afcb42e
explanatory comment
afeldman-nm May 29, 2024
d13e08e
refactoring
afeldman-nm May 29, 2024
ab92fb0
spelling fix
afeldman-nm May 29, 2024
582a0f5
rename
afeldman-nm May 29, 2024
a20be6d
skip enc/dec tests if HIP
afeldman-nm May 29, 2024
a9a162d
formatting
afeldman-nm May 29, 2024
6e3cfe1
Refactored checks into utils file
afeldman-nm May 29, 2024
622ce09
format
afeldman-nm May 29, 2024
4d88a89
wip trying to combine attention metadata caches
afeldman-nm May 29, 2024
3dfcb55
wip trying to merge self/cross caches; trying to fix attn_bias issues…
afeldman-nm May 29, 2024
6960723
wip
afeldman-nm May 29, 2024
104a8aa
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm May 30, 2024
31275cc
wip merging attention metadata
afeldman-nm May 30, 2024
7b2374c
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_valid_md
afeldman-nm May 30, 2024
a643436
simplied is_all_cross_attn_metadata_set()
afeldman-nm May 30, 2024
a0e1a2a
Merge branch 'upstream-main' into infra_enc_dec_cross_attn
afeldman-nm Jun 3, 2024
2a1d84a
test: envs.VLLM_ATTENTION_BACKEND
afeldman-nm Jun 3, 2024
f6e0310
formatting'
afeldman-nm Jun 3, 2024
60c01c3
attempted to fix issue whereby selector test doesn't cleanup environm…
afeldman-nm Jun 3, 2024
5c94166
(1) In top-level tests utils.py added env var context manager, (2) in…
afeldman-nm Jun 3, 2024
eaa627f
wip tests
afeldman-nm Jun 3, 2024
9c597c4
FIX: test_attention_selector.py was leaking VLLM_ATTENTION_BACKEND va…
afeldman-nm Jun 3, 2024
9831ce6
formatting
afeldman-nm Jun 3, 2024
180ba26
Merge branch 'upstream-main' into backend_context_manager
afeldman-nm Jun 3, 2024
b7c1e84
merge upstream main & well-formatted backend override fix
afeldman-nm Jun 3, 2024
faf9118
formatting
afeldman-nm Jun 3, 2024
61d63bd
removed comment about supported head_size's, which is not relevant un…
afeldman-nm Jun 3, 2024
b9b6048
small refactors
afeldman-nm Jun 3, 2024
e2e2082
refactoring
afeldman-nm Jun 3, 2024
b223873
make_qkv() tensors are 4D
afeldman-nm Jun 3, 2024
2ea335c
combined seq_start_loc init with cumsum
afeldman-nm Jun 3, 2024
02875ab
xformers metadata allows unspecified values for most Optional members…
afeldman-nm Jun 3, 2024
79f307d
refactored slot mapping logic
afeldman-nm Jun 3, 2024
e790a00
formatting
afeldman-nm Jun 3, 2024
aae601b
selective renaming of cross -> encoder
afeldman-nm Jun 3, 2024
d871c9f
added encoder, enc/dec cross-attention bias members
afeldman-nm Jun 3, 2024
90d5c0d
xformers metadata now uses a different attn_bias for self, encoder an…
afeldman-nm Jun 3, 2024
a973c2b
refactoring
afeldman-nm Jun 3, 2024
d8d284e
wip typing issues
afeldman-nm Jun 3, 2024
27dc095
added paged attention args collection, conditional on metadata attent…
afeldman-nm Jun 3, 2024
2445905
logic to support encoder-specific sequence length usage in xformers
afeldman-nm Jun 3, 2024
8dabdc2
formatting
afeldman-nm Jun 3, 2024
c6200e6
test name change; encoder functionality can tolerate being provided w…
afeldman-nm Jun 3, 2024
0dc197b
formatting
afeldman-nm Jun 3, 2024
3d3c04f
prefill supports shared metadata structure
afeldman-nm Jun 3, 2024
c132caa
formatting
afeldman-nm Jun 3, 2024
39ee51a
full generalization of prefill & decode metadata structures
afeldman-nm Jun 3, 2024
c41917b
renamed metdata caching structure
afeldman-nm Jun 3, 2024
e738fb4
reverted my custom env var patch impl
afeldman-nm Jun 3, 2024
dfe9c10
monkeypatch works
afeldman-nm Jun 3, 2024
8221758
formatting
afeldman-nm Jun 3, 2024
98fcb64
Merge branch 'backend_context_manager' into infra_enc_dec_cross_attn_…
afeldman-nm Jun 3, 2024
db2b2d2
wip monkeypatch
afeldman-nm Jun 3, 2024
cbb89b1
refactored constants into tests/kernels/utils.py
afeldman-nm Jun 3, 2024
33b598a
Merge branch 'backend_context_manager' into infra_enc_dec_cross_attn_…
afeldman-nm Jun 3, 2024
c9ce86b
wip enc/dec monkeypatch integration
afeldman-nm Jun 3, 2024
ca570e7
a refactoring backend override functionality into tests/kernels/utils.py
afeldman-nm Jun 3, 2024
bf88882
Merge branch 'backend_context_manager' into infra_enc_dec_cross_attn_…
afeldman-nm Jun 3, 2024
b2e131f
test rename
afeldman-nm Jun 3, 2024
ed8f8b3
Comments & type hints
afeldman-nm Jun 3, 2024
c944cc1
Merge branch 'backend_context_manager' into infra_enc_dec_cross_attn
afeldman-nm Jun 3, 2024
bea9e01
Merge branch 'upstream-main' into backend_context_manager
afeldman-nm Jun 3, 2024
8abe51c
small refactors per @sroy745 suggestions
afeldman-nm Jun 3, 2024
fec833e
Merge branch 'upstream-main' into backend_context_manager
afeldman-nm Jun 3, 2024
8bd5280
Merge branch 'backend_context_manager' into infra_enc_dec_cross_attn
afeldman-nm Jun 3, 2024
da1b648
merged backend env config
afeldman-nm Jun 3, 2024
eefd588
merge
afeldman-nm Jun 3, 2024
60a21e3
fixed _get_seq_len_block_table_args() to change behavior based on is_…
afeldman-nm Jun 3, 2024
306ea5b
removed inference of encoder metadata attributes; removed guessing of…
afeldman-nm Jun 3, 2024
eda2273
wip refactoring
afeldman-nm Jun 3, 2024
9425f0c
refactored helper functions into diffferent utils files
afeldman-nm Jun 3, 2024
62fb8d1
_ for private functions in test_encoder_decoder_attn
afeldman-nm Jun 3, 2024
2730daa
_ refactor
afeldman-nm Jun 3, 2024
5face2a
formatting
afeldman-nm Jun 3, 2024
f39155a
constructing attn md with minimum number of arguments
afeldman-nm Jun 3, 2024
51144ad
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 3, 2024
3f87f37
merge
afeldman-nm Jun 4, 2024
c7edbc6
Formatting
afeldman-nm Jun 4, 2024
b023557
typing and formatting
afeldman-nm Jun 4, 2024
9a359b3
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 4, 2024
af0c0b9
refactored block table/slot mapping construction process for decoder …
afeldman-nm Jun 4, 2024
50bca08
finished breaking block table/slot mapping construction into steps; f…
afeldman-nm Jun 4, 2024
90610da
slight refactor
afeldman-nm Jun 4, 2024
a006cc8
refactored encoder test into the cross-attention test
afeldman-nm Jun 5, 2024
20b95b0
slight refactoring
afeldman-nm Jun 5, 2024
6d52d60
QKVInputs and PackedQKVInputs named tuple integration to simplify tes…
afeldman-nm Jun 5, 2024
9d7ebb3
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 5, 2024
a81712b
refactoring
afeldman-nm Jun 5, 2024
d35ea41
format
afeldman-nm Jun 5, 2024
27782df
yapf fix
afeldman-nm Jun 5, 2024
c3a2e7a
import reorg
afeldman-nm Jun 5, 2024
8babfda
switched to star import to avoid unsatisfiable formatting constraints
afeldman-nm Jun 5, 2024
ce2422b
progress on memory map structure integration
afeldman-nm Jun 5, 2024
0045051
completed integration of KVMemoryMap into tests
afeldman-nm Jun 5, 2024
91eb067
first step toward QKVO integration into tests
afeldman-nm Jun 5, 2024
a6aee80
wip test params structure integration
afeldman-nm Jun 5, 2024
083c205
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 5, 2024
ea09789
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 5, 2024
ee51260
prephase md struct using test params
afeldman-nm Jun 5, 2024
50a45cc
correctness check helper function
afeldman-nm Jun 5, 2024
cd0a1aa
wip
afeldman-nm Jun 5, 2024
ec5977d
debugging test params integration
afeldman-nm Jun 6, 2024
196d4b1
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 6, 2024
1f7b2eb
passing tests with test params integration
afeldman-nm Jun 6, 2024
76b0b9e
format
afeldman-nm Jun 6, 2024
aa5363a
test points and test resources structures integrated
afeldman-nm Jun 6, 2024
5351416
formatting
afeldman-nm Jun 6, 2024
8d390a0
first attempt at chunked prefill failure test
afeldman-nm Jun 6, 2024
68b6d4b
narrowed the space of test-cases for unsupported scenarios
afeldman-nm Jun 6, 2024
5923002
format
afeldman-nm Jun 6, 2024
c3e5d2a
skeleton of encdec prefix cache failure test; fixed bug where max enc…
afeldman-nm Jun 6, 2024
739ab3c
wip prefill test
afeldman-nm Jun 6, 2024
a2a451f
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 6, 2024
2265224
passing prefix cache failure test
afeldman-nm Jun 6, 2024
d72aaa9
format
afeldman-nm Jun 6, 2024
1c19d36
type annotations; formatting
afeldman-nm Jun 6, 2024
e10340d
completely replaced collections.namedtuple with typing.NamedTuple w/ …
afeldman-nm Jun 6, 2024
67ab576
removed HIP check; clarified assumptions about supported backends in …
afeldman-nm Jun 6, 2024
4ae1b8a
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 6, 2024
dc7d3c8
wip
afeldman-nm Jun 8, 2024
bab33c3
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 8, 2024
e9c2a85
wip comments
afeldman-nm Jun 10, 2024
5791028
small fix
afeldman-nm Jun 10, 2024
28a9e76
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 10, 2024
0a65267
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 10, 2024
f0cd5ea
formatting
afeldman-nm Jun 10, 2024
2a7fd86
enc/dec test comment updates; some function arg changes; formatting
afeldman-nm Jun 10, 2024
286489c
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 10, 2024
8e1daa1
formatting
afeldman-nm Jun 10, 2024
a2a7ac5
fixed attention selector test to use FLASH_ATTN string constant var i…
afeldman-nm Jun 10, 2024
dfc96c5
Merge branch 'infra_enc_dec_cross_attn' into infra_enc_dec_cross_attn…
afeldman-nm Jun 10, 2024
d357568
additional commenting & added string constants for other backends
afeldman-nm Jun 10, 2024
0285548
Merge branch 'upstream-main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 10, 2024
97cad0b
encoder-only unit test passes
afeldman-nm Jun 12, 2024
29fa1af
refactoring
afeldman-nm Jun 13, 2024
488c0fa
merge upstream main; attention type comment
afeldman-nm Jun 14, 2024
c9f11ff
comment fixes
afeldman-nm Jun 14, 2024
196e671
removed unnecessary tests
afeldman-nm Jun 14, 2024
03e5d81
assert value None-ness matches key None-ness
afeldman-nm Jun 14, 2024
1f3874d
comment fix
afeldman-nm Jun 14, 2024
528b4a7
Remove util fxns & error strings for unneeded tests
afeldman-nm Jun 14, 2024
708a4b3
merge
afeldman-nm Jun 14, 2024
f06c687
Merge branch 'main' into infra_enc_dec_cross_attn_encoder_only
afeldman-nm Jun 14, 2024
b3c3411
formatting
afeldman-nm Jun 14, 2024
4dccd51
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 17, 2024
e229e00
format
afeldman-nm Jun 17, 2024
90aec38
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 17, 2024
4758680
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 17, 2024
addde7d
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 18, 2024
d0fd9e1
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 18, 2024
7b9cb7f
Replace attn_metadata.attention_type and attn_metadata._attn_type wit…
afeldman-nm Jun 18, 2024
c3f7da7
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 18, 2024
5f8c7f6
Moved attention type for attn_metadata to attention forward(); added …
afeldman-nm Jun 18, 2024
525303c
num encoder tokens
afeldman-nm Jun 18, 2024
91cbaa6
merge; resolve conflicts
afeldman-nm Jun 18, 2024
ea37e17
merge conflict; typing; formatting
afeldman-nm Jun 18, 2024
67ed419
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 20, 2024
e9d7ede
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 21, 2024
ca68c63
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 21, 2024
ce88fa3
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 21, 2024
5ce2dd0
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 24, 2024
125e5dc
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 25, 2024
597526a
removed extra line
afeldman-nm Jun 25, 2024
a178b7a
changed nested if/else to elif/else in xformers mask computation code
afeldman-nm Jun 25, 2024
06c7f75
reorganized helper functions that were only being used for testing in…
afeldman-nm Jun 25, 2024
47c9f39
removed attention_type
afeldman-nm Jun 25, 2024
2f0b05b
typing and formatting
afeldman-nm Jun 25, 2024
d23c284
typing and formatting; fixed escape sequences in comments
afeldman-nm Jun 25, 2024
1a6e5a3
moved make_tensor_with_pad() helper function back to vllm.utils
afeldman-nm Jun 25, 2024
e2a46e3
formatting
afeldman-nm Jun 25, 2024
4dabe19
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 25, 2024
7ca0d7a
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 26, 2024
c24697f
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 27, 2024
75756b9
removed redundant elif
afeldman-nm Jun 27, 2024
bcccc34
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 27, 2024
c8f8d59
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 27, 2024
a501849
reverted unnecessarily vllm/utils.py changes
afeldman-nm Jun 27, 2024
83d474e
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 28, 2024
64981b5
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 28, 2024
8d36458
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 29, 2024
5ff9c76
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jun 30, 2024
2828aa7
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 1, 2024
65e47db
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 3, 2024
2f0eb9b
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 3, 2024
d81662c
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 4, 2024
13f5b50
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 5, 2024
5dbebbc
Update vllm/attention/backends/torch_sdpa.py
afeldman-nm Jul 8, 2024
07df0e1
Update vllm/attention/layer.py
afeldman-nm Jul 8, 2024
7e0bc57
Merge branch 'main' into infra_enc_dec_cross_attn_reviews
afeldman-nm Jul 8, 2024
e837a73
Merge branch 'infra_enc_dec_cross_attn_reviews' into infra_enc_dec_cr…
afeldman-nm Jul 8, 2024
7ce9a51
merged in first pieces of woosuk feedback & latest main; formatting
afeldman-nm Jul 8, 2024
9ae6728
fixed specific point-changes requested by woosuk
afeldman-nm Jul 8, 2024
a1bf652
test_encoder_decoder_attn.py cleanup
afeldman-nm Jul 8, 2024
4f27946
tests/kernels/utils.py cleanup
afeldman-nm Jul 8, 2024
5ee30fe
vllm/attention/backends/abstract.py cleanup
afeldman-nm Jul 8, 2024
45fc9f7
vllm/attention/backends/blocksparse_attn.py cleanup
afeldman-nm Jul 8, 2024
097aff2
vllm/attention/backends/flash_attn.py cleanup
afeldman-nm Jul 8, 2024
d8a692b
cleaning up a number of backends & backends utils.py
afeldman-nm Jul 8, 2024
5df73fc
xformers backend cleanup
afeldman-nm Jul 8, 2024
6cd595c
formatting
afeldman-nm Jul 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions tests/kernels/test_attention_selector.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,32 +47,32 @@ def test_flash_attn(monkeypatch):
# Unsupported CUDA arch
with patch("torch.cuda.get_device_capability", return_value=[7, 5]):
backend = which_attn_to_use(8, 16, 8, None, torch.float16, None, 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# Unsupported data type
backend = which_attn_to_use(8, 16, 8, None, torch.float8_e4m3fn, None, 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# Unsupported kv cache data type
backend = which_attn_to_use(8, 16, 8, None, torch.float16, "fp8", 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# Unsupported block size
backend = which_attn_to_use(8, 16, 8, None, torch.float16, None, 8)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# Unsupported sliding window
backend = which_attn_to_use(8, 16, 8, 1, torch.float16, None, 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# flash-attn is not installed
with patch.dict('sys.modules', {'vllm_flash_attn': None}):
backend = which_attn_to_use(8, 16, 8, None, torch.float16, None, 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL

# Unsupported head size
backend = which_attn_to_use(8, 17, 8, None, torch.float16, None, 16)
assert backend.name != "FLASH_ATTN"
assert backend.name != STR_FLASH_ATTN_VAL


def test_invalid_env(monkeypatch):
Expand Down
Loading
Loading