Skip to content

Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HLSL #7574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: trunk
Choose a base branch
from

Conversation

robamler
Copy link
Contributor

@robamler robamler commented Apr 18, 2025

Connections

Description

Replace the polyfills for dot4I8Packed and dot4U8Packed from #7494 with specialized instructions where they are available. More precisely:

  • For HLSL, we check if Shader Model >= 6.4, in which case we use dot4add_{i,u}8packed (the specification explicitly states: "no separate capability bit check is required, beyond assuring the use of Shader Model 6.4")
  • For SPIR-V, we check if the capabilities DotProduct and DotProductInput4x8BitPacked are available, in which case we use the "Packed Vector Format" argument for Op{S,U}Dot (and we emit the corresponding OpCapability and OpExtension statements at the beginning of the output).

If either of these tests fail, or if we are on Metal or GLSL, we fall back to the existing polyfills.

Testing

I added two tests with .toml configuration files that explicitly enable/prevent these specializations.

Squash or Rebase?

I think each commit should pass all CI tests.

Open Questions / Notes

  • On SPIR-V, the capabilities required for this specialization are only defined if lang_version >= 1.6, but I didn't check for high enough lang_version because I already check the capabilities_available field. This follows precedent in other places in the code base, but I'm not sure if it's sufficient to only check capabilities_available. If capabilities_available == None, then comments in the code (here and here) indicate that this should be interpreted as "all capabilities are permitted", and so the specialized code will be generated. But I guess "all capabilities" is still restricted to "all capabilities that are defined for lang_version". Should I check the lang_version in this case somehow, or is capabilities_available == None only used for debugging anyway? Unfortunately, lang_version isn't exposed anymore at the stage where we do the check, so I'd have to add it as an extra field to the Writer.
    • [Update: added a check in be9debd, but it's a bit hacky]
  • Unclear whether OpSDot with "Packed Vector Format" expects signed or unsigned arguments in spv (see comment below) [Update: resolved in this comment]
  • On Metal, we still fall back to the polyfill because, as far as I can tell, msl doesn't have builtin support for integer dot products (but it does support packed integer vectors, see packed_char4 in Table 2.4 in the specification).

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@robamler robamler force-pushed the packed-vector-format branch from e06d774 to 705dc6d Compare April 18, 2025 19:57
@cwfitzgerald cwfitzgerald changed the title Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HSLS Use specialized intrinsics for dot4{I, U}8Packed on SPIR-V and HLSL Apr 18, 2025
When checking for capabilities in SPIR-V,
`capabilities_available == None` indicates that all capabilities are
available. However, some capabilities are not even defined for all
language versions, so we still need to check if the requested
capabilities even exist in the language version we're using.
@robamler robamler force-pushed the packed-vector-format branch from c93b2dd to be9debd Compare April 18, 2025 22:43
@robamler
Copy link
Contributor Author

robamler commented Apr 18, 2025

On SPIR-V, the capabilities required for this specialization are only defined if lang_version >= 1.6, but I didn't check for high enough lang_version because I already check the capabilities_available field. This follows precedent in other places in the code base, but I'm not sure if it's sufficient to only check capabilities_available. If capabilities_available == None, then comments in the code (here and here) indicate that this should be interpreted as "all capabilities are permitted", and so the specialized code will be generated. But I guess "all capabilities" is still restricted to "all capabilities that are defined for lang_version". Should I check the lang_version in this case somehow, or is capabilities_available == None only used for debugging anyway? Unfortunately, lang_version isn't exposed anymore at the stage where we do the check, so I'd have to add it as an extra field to the Writer.

In case the above text is confusing: I pushed be9debd as an example of how an additional check for lang_version could be implemented, but it's a bit hacky and I'm also not completely sure if this check is necessary (I don't know if capabilities_available == None is supposed to occur outside of tests).

@robamler

This comment was marked as resolved.

@robamler
Copy link
Contributor Author

The sign issue has been resolved. The implementation should be correct as is. Turns out SPIR-V doesn't care about signedness in this case: "all signed and unsigned operations always work on unsigned types, and the semantics of operation come from the opcode" (credits to Nicol Bolas, see this StackOverflow answer).

@robamler

This comment was marked as resolved.

robamler added a commit to robamler/wgpu that referenced this pull request Apr 22, 2025
See gfx-rs#7574, in particular
<gfx-rs#7574 (comment)>.

Adds `FeaturesWGPU::NATIVE_PACKED_INTEGER_DOT_PRODUCT`, which is
available on `Adapter`s that support the specialized implementations for
`dot4I8Packed` and `dot4U8Packed` implemented in gfx-rs#7574 (currently, this
includes DX12 with Shader Model >= 6.4 and Vulkan with device extension
"VK_KHR_shader_integer_dot_product".

If this feature is requested during `Device` creation, the device is set
up such that `dot4I8Packed` and `dot4U8Packed` will be compiled to their
respective specialized instructions. This means that, on a vulkan
`Device`, the SPIR-V language version is set to 1.6, and the required
SPIR-V capabilities are marked as available (on DX12, requesting the
feature doesn't change anything since availability of the feature
already guarantees that Shader Model >= 6.4, which is all we need to
generate specialized code).
robamler added a commit to bamler-lab/compressed-nn-ops-demo that referenced this pull request Apr 22, 2025
Using the specialized instructions from
<gfx-rs/wgpu#7574>.
@robamler
Copy link
Contributor Author

robamler commented Apr 22, 2025

I separated out the above issue of how these naga optimizations can be used in wgpu to #7595. I think the present PR can be reviewed independently from #7595.

robamler added a commit to robamler/wgpu that referenced this pull request Apr 22, 2025
See gfx-rs#7574, in particular
<gfx-rs#7574 (comment)>.

Adds `FeaturesWGPU::NATIVE_PACKED_INTEGER_DOT_PRODUCT`, which is
available on `Adapter`s that support the specialized implementations for
`dot4I8Packed` and `dot4U8Packed` implemented in gfx-rs#7574 (currently, this
includes DX12 with Shader Model >= 6.4 and Vulkan with device extension
"VK_KHR_shader_integer_dot_product").

If this feature is requested during `Device` creation, the device is set
up such that `dot4I8Packed` and `dot4U8Packed` will be compiled to their
respective specialized instructions. This means that, on a vulkan
`Device`, the SPIR-V language version is set to 1.6, and the required
SPIR-V capabilities are marked as available (on DX12, requesting the
feature doesn't change anything since availability of the feature
already guarantees that Shader Model >= 6.4, which is all we need to
generate specialized code).
@robamler robamler requested a review from cwfitzgerald April 22, 2025 17:13
@robamler
Copy link
Contributor Author

robamler commented Apr 24, 2025

Sorry for the back-and-forth on this one. I realized that I think I misunderstood the interplay between SPIR-V versions, SPIR-V extensions, and vulkan device extensions. My current understanding is:

  • The vulkan device extension VK_KHR_shader_integer_dot_product has a "dependency" on the SPIR-V extension SPV_KHR_integer_dot_product.
  • The latter requires only SPIR-V 1.0, and it defines ops and capabilities with the "KHR" suffix (e.g., OpSDotKHR and DotProductInput4x8BitPackedKHR).
  • These ops and capabilities seem to be integrated into SPIR-V proper (i.e., without requiring an extension) as of SPIR-V version 1.6 (see, e.g., right column here); here, their "KHR" suffix may be dropped (but may also still be used for backward compatibility).

Thus, as far as I understand now, the specialized instructions required for this optimization to work on SPIR-V should be available if we're either

  • on SPIR-V 1.6; or
  • on an older version of SPIR-V, and we requested the SPIR-V extension SPV_KHR_integer_dot_product, which we can do on older versions of SPIR-V as long as the vulkan adapter has device extension VK_KHR_shader_integer_dot_product.

In either case, we still have to check if the capabilities DotProductKHR and DotProductInput4x8BitPackedKHR are available (and request them).

Is my understanding correct? I'd appreciate any corrections because I want to make sure I don't introduce any regressions on exotic platforms.

@teoxoy
Copy link
Member

teoxoy commented Apr 25, 2025

Is my understanding correct? I'd appreciate any corrections because I want to make sure I don't introduce any regressions on exotic platforms.

That is correct. A few pointers in this direction: Writer::capabilities_available contains the capabilities passed by wgpu-hal so, it's enough to consult if the DotProduct & DotProductInput4x8BitPacked capabilities are present in capabilities_available, then based on that emit the fast path. Additionally, if the capabilities are present but we are emitting a version of SPIR-V lower than 1.6, only then call use_extension("SPV_KHR_integer_dot_product").

@teoxoy
Copy link
Member

teoxoy commented Apr 25, 2025

I don't know if capabilities_available == None is supposed to occur outside of tests

If capabilities_available is empty, that should really mean that no extra capabilities can be used.
Snapshot tests can also specify capabilities if needed (via the toml).

@teoxoy
Copy link
Member

teoxoy commented Apr 25, 2025

  • On Metal, we still fall back to the polyfill because, as far as I can tell, msl doesn't have builtin support for integer dot products (but it does support packed integer vectors, see packed_char4 in Table 2.4 in the specification).

It could be that if we use packed vectors as in gpuweb/gpuweb#2677 (comment) (or `char`` directly), the Metal compiler will compile it down to more specialized instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants