Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hlsl-out] Flesh out matrix handling documentation. #1850

Merged
merged 1 commit into from
Apr 18, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 85 additions & 14 deletions src/back/hlsl/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,29 +6,100 @@ Backend for [HLSL][hlsl] (High-Level Shading Language).
- 5.1
- 6.0

# General Matrix Note
# Layout of values in `uniform` buffers

All matrix construction/deconstruction is row based in HLSL. This means that when
we construct a matrix from column vectors, our matrix will be implicitly transposed.
The inverse transposition happens when we call `[0]` to get the zeroth column vector.
WGSL's ["Internal Layout of Values"][ilov] rules specify how each WGSL
type should be stored in `uniform` and `storage` buffers. The HLSL we
generate must access values in that form, even when it is not what
HLSL would use normally.

Because all of our matrices are implicitly transposed, we flip arguments to `mul`. `mat * vec`
becomes `vec * mat`, etc. This acts as the inverse transpose making the results identical.
The rules described here only apply to WGSL `uniform` variables. WGSL
`storage` buffers are translated as HLSL `ByteAddressBuffers`, for
which we generate `Load` and `Store` method calls with explicit byte
offsets. WGSL pipeline inputs must be scalars or vectors; they cannot
be matrices, which is where the interesting problems arise.

The only time we don't get this implicit transposition is when reading matrices from Uniforms/Push Constants.
To deal with this, we add `row_major` to all declarations of matrices in Uniforms/Push Constants.
## Row- and column-major ordering for matrices

Finally because all of our matrices are transposed, if you use `mat3x4`, it'll become `float3x4` in HLSL
(HLSL has inverted col/row notation).
WGSL specifies that matrices in uniform buffers are stored in
column-major order. This matches HLSL's default, so one might expect
things to be straightforward. Unfortunately, WGSL and HLSL disagree on
what indexing a matrix means: in WGSL, `m[i]` retrieves the `i`'th
*column* of `m`, whereas in HLSL it retrieves the `i`'th *row*. We
want to avoid translating `m[i]` into some complicated reassembly of a
vector from individually fetched components, so this is a problem.

# Matrix struct member of the form `matCx2` Note
However, with a bit of trickery, it is possible to use HLSL's `m[i]`
as the translation of WGSL's `m[i]`:

Struct member matrices of the form `matCx2` are translated to a sequence of C `vec2`s due to
differences in alignment between WGSL and HLSL for uniform buffers.
- We declare all matrices in uniform buffers in HLSL with the
`row_major` qualifier, and transpose the row and column counts: a
WGSL `mat3x4<f32>`, say, becomes an HLSL `row_major float3x4`. (Note
that WGSL and HLSL type names put the row and column in reverse
order.) Since the HLSL type is the transpose of how WebGPU directs
the user to store the data, HLSL will load all matrices transposed.

Accesses to these matrices are handled by injected functions.
- Since matrices are transposed, an HLSL indexing expression retrieves
the "columns" of the intended WGSL value, as desired.

- For vector-matrix multiplication, since `mul(transpose(m), v)` is
equivalent to `mul(v, m)` (note the reversal of the arguments), and
`mul(v, transpose(m))` is equivalent to `mul(m, v)`, we can
translate WGSL `m * v` and `v * m` to HLSL by simply reversing the
arguments to `mul`.

## Padding in two-row matrices

An HLSL `row_major floatKx2` matrix has padding between its rows that
the WGSL `matKx2<f32>` matrix it represents does not. HLSL stores all
matrix rows [aligned on 16-byte boundaries][16bb], whereas WGSL says
that the columns of a `matKx2<f32>` need only be [aligned as required
for `vec2<f32>`][ilov], which is [eight-byte alignment][8bb].

To compensate for this, any time a `matKx2<f32>` appears in a WGSL
`uniform` variable, whether directly as the variable's type or as a
struct member, we actually emit `K` separate `float2` members, and
assemble/disassemble the matrix from its columns (in WGSL; rows in
HLSL) upon load and store.

For example, the following WGSL struct type:

```ignore
struct Baz {
m: mat3x2<f32>,
}
```

is rendered as the HLSL struct type:

```ignore
struct Baz {
float2 m_0; float2 m_1; float2 m_2;
};
```

The `wrapped_struct_matrix` functions in `help.rs` generate HLSL
helper functions to access such members, converting between the stored
form and the HLSL matrix types appropriately. For example, for reading
the member `m` of the `Baz` struct above, we emit:

```ignore
float3x2 GetMatmOnBaz(Baz obj) {
return float3x2(obj.m_0, obj.m_1, obj.m_2);
}
```

We also emit an analogous `Set` function, as well as functions for
accessing individual columns by dynamic index.

At present, we do not generate correct HLSL when `matCx2<f32>` us used
directly as the type of a WGSL `uniform` global ([#1837]).

[hlsl]: https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl
[ilov]: https://gpuweb.github.io/gpuweb/wgsl/#internal-value-layout
[16bb]: https://github.com/microsoft/DirectXShaderCompiler/wiki/Buffer-Packing#constant-buffer-packing
[8bb]: https://gpuweb.github.io/gpuweb/wgsl/#alignment-and-size
[#1837]: https://github.com/gfx-rs/naga/issues/1837
*/

mod conv;
Expand Down