Skip to content

Commit

Permalink
Temp hack to work with prebuilt 0.4.0-post kernels
Browse files Browse the repository at this point in the history
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
  • Loading branch information
njhill authored and joerunde committed Apr 11, 2024
1 parent a880d89 commit 3d71ef7
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions vllm/attention/ops/paged_attn.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ def write_to_paged_cache(
value_cache,
slot_mapping.flatten(),
kv_cache_dtype,
kv_scale,
# kv_scale,
)

@staticmethod
Expand Down Expand Up @@ -123,7 +123,7 @@ def forward_decode(
max_context_len,
alibi_slopes,
kv_cache_dtype,
kv_scale,
# kv_scale,
)
else:
# Run PagedAttention V2.
Expand Down Expand Up @@ -155,7 +155,7 @@ def forward_decode(
max_context_len,
alibi_slopes,
kv_cache_dtype,
kv_scale,
# kv_scale,
)
return output

Expand Down

0 comments on commit 3d71ef7

Please sign in to comment.