From 14737d3e53753d44d0fda7c04cd2bd5271056507 Mon Sep 17 00:00:00 2001
From: Jim Blandy <jimb@red-bean.com>
Date: Sun, 17 Apr 2022 14:56:37 -0700
Subject: [PATCH] [hlsl-out] Flesh out matrix handling documentation.

---
 src/back/hlsl/mod.rs | 99 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 85 insertions(+), 14 deletions(-)
diff --git a/src/back/hlsl/mod.rs b/src/back/hlsl/mod.rs
index f51d49db39..7ecfc0fc92 100644
--- a/src/back/hlsl/mod.rs
+++ b/src/back/hlsl/mod.rs
@@ -6,29 +6,100 @@ Backend for [HLSL][hlsl] (High-Level Shading Language).
 - 5.1
 - 6.0
 
-# General Matrix Note
+# Layout of values in `uniform` buffers
 
-All matrix construction/deconstruction is row based in HLSL. This means that when
-we construct a matrix from column vectors, our matrix will be implicitly transposed.
-The inverse transposition happens when we call `[0]` to get the zeroth column vector.
+WGSL's ["Internal Layout of Values"][ilov] rules specify how each WGSL
+type should be stored in `uniform` and `storage` buffers. The HLSL we
+generate must access values in that form, even when it is not what
+HLSL would use normally.
 
-Because all of our matrices are implicitly transposed, we flip arguments to `mul`. `mat * vec`
-becomes `vec * mat`, etc. This acts as the inverse transpose making the results identical.
+The rules described here only apply to WGSL `uniform` variables. WGSL
+`storage` buffers are translated as HLSL `ByteAddressBuffers`, for
+which we generate `Load` and `Store` method calls with explicit byte
+offsets. WGSL pipeline inputs must be scalars or vectors; they cannot
+be matrices, which is where the interesting problems arise.
 
-The only time we don't get this implicit transposition is when reading matrices from Uniforms/Push Constants.
-To deal with this, we add `row_major` to all declarations of matrices in Uniforms/Push Constants.
+## Row- and column-major ordering for matrices
 
-Finally because all of our matrices are transposed, if you use `mat3x4`, it'll become `float3x4` in HLSL
-(HLSL has inverted col/row notation).
+WGSL specifies that matrices in uniform buffers are stored in
+column-major order. This matches HLSL's default, so one might expect
+things to be straightforward. Unfortunately, WGSL and HLSL disagree on
+what indexing a matrix means: in WGSL, `m[i]` retrieves the `i`'th
+*column* of `m`, whereas in HLSL it retrieves the `i`'th *row*. We
+want to avoid translating `m[i]` into some complicated reassembly of a
+vector from individually fetched components, so this is a problem.
 
-# Matrix struct member of the form `matCx2` Note
+However, with a bit of trickery, it is possible to use HLSL's `m[i]`
+as the translation of WGSL's `m[i]`:
 
-Struct member matrices of the form `matCx2` are translated to a sequence of C `vec2`s due to
-differences in alignment between WGSL and HLSL for uniform buffers.
+- We declare all matrices in uniform buffers in HLSL with the
+  `row_major` qualifier, and transpose the row and column counts: a
+  WGSL `mat3x4<f32>`, say, becomes an HLSL `row_major float3x4`. (Note
+  that WGSL and HLSL type names put the row and column in reverse
+  order.) Since the HLSL type is the transpose of how WebGPU directs
+  the user to store the data, HLSL will load all matrices transposed.
 
-Accesses to these matrices are handled by injected functions.
+- Since matrices are transposed, an HLSL indexing expression retrieves
+  the "columns" of the intended WGSL value, as desired.
+
+- For vector-matrix multiplication, since `mul(transpose(m), v)` is
+  equivalent to `mul(v, m)` (note the reversal of the arguments), and
+  `mul(v, transpose(m))` is equivalent to `mul(m, v)`, we can
+  translate WGSL `m * v` and `v * m` to HLSL by simply reversing the
+  arguments to `mul`.
+
+## Padding in two-row matrices
+
+An HLSL `row_major floatKx2` matrix has padding between its rows that
+the WGSL `matKx2<f32>` matrix it represents does not. HLSL stores all
+matrix rows [aligned on 16-byte boundaries][16bb], whereas WGSL says
+that the columns of a `matKx2<f32>` need only be [aligned as required
+for `vec2<f32>`][ilov], which is [eight-byte alignment][8bb].
+
+To compensate for this, any time a `matKx2<f32>` appears in a WGSL
+`uniform` variable, whether directly as the variable's type or as a
+struct member, we actually emit `K` separate `float2` members, and
+assemble/disassemble the matrix from its columns (in WGSL; rows in
+HLSL) upon load and store.
+
+For example, the following WGSL struct type:
+
+```ignore
+struct Baz {
+        m: mat3x2<f32>,
+}
+```
+
+is rendered as the HLSL struct type:
+
+```ignore
+struct Baz {
+    float2 m_0; float2 m_1; float2 m_2;
+};
+```
+
+The `wrapped_struct_matrix` functions in `help.rs` generate HLSL
+helper functions to access such members, converting between the stored
+form and the HLSL matrix types appropriately. For example, for reading
+the member `m` of the `Baz` struct above, we emit:
+
+```ignore
+float3x2 GetMatmOnBaz(Baz obj) {
+    return float3x2(obj.m_0, obj.m_1, obj.m_2);
+}
+```
+
+We also emit an analogous `Set` function, as well as functions for
+accessing individual columns by dynamic index.
+
+At present, we do not generate correct HLSL when `matCx2<f32>` us used
+directly as the type of a WGSL `uniform` global ([#1837]).
 
 [hlsl]: https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl
+[ilov]: https://gpuweb.github.io/gpuweb/wgsl/#internal-value-layout
+[16bb]: https://github.com/microsoft/DirectXShaderCompiler/wiki/Buffer-Packing#constant-buffer-packing
+[8bb]: https://gpuweb.github.io/gpuweb/wgsl/#alignment-and-size
+[#1837]: https://github.com/gfx-rs/naga/issues/1837
 */
 
 mod conv;