[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

quadpixels · 2022-03-29T06:11:16Z

Related issue = #992

This makes a bit more progress for the dx11 backend, it now runs the mpm88.py program. This change has to deal with a few updates in the Taichi infrastructure:

SPIRV-cross now differentiates read-only buffers (ResourceBinder::buffer) from read-write buffers (ResourceBinder::rw_buffer) differently, so the backend has to handle Constant Buffers in addition to Unordered Access Views.
SPIRV-cross now uses a different set of default options, so the backend needs to call remap_num_workgroups_builtin.
Using spv::Op::OpAtomicIIncrement for the random number generator state increment, so that the generated HLSL will not cause the HLSL compiler to generate an error

Tested: runs mpm88.py

Change the parameter from SNodeTreeManager to SNodeTreeManager*.

1. Change Dx11Pipeline::device_ from a shared_ptr to a raw pointer Previously, Dx11Pipeline::device_ was a shared_ptr. This can cause the device_ to be destructed twice when the program finishes. Because the pipeline is created by Dx11Device::create_pipeline, logically speaking the device owns the pipeline, rather than the pipeline owning the device. So it makes sense if the pipeline is not tracking the life time of its Dx11Device object, thus switching to a raw pointer. 2. Call hlsl.remap_num_workgroups_builtin() This fixes the "RuntimeError: NumWorkgroups builtin is used, but remap_num_workgroups_builtin() was not called. Cannot emit code for this builtin." error. 3. This change also adds a few other Dx11CommandList commands.

It turns out the SPIRV code generator has started to use constant buffers now, and because of this, we must differentiate buffer() and rw_buffer(). With SPIRV-HLSL, buffer() populates constant buffers, while rw_buffer() populates unordered access views (UAVs). This change makes the necessary changes. Tested: can run "laplace.py"

netlify · 2022-03-29T06:11:21Z

✅ Deploy Preview for docsite-preview ready!

Name	Link
🔨 Latest commit	`5c4857d`
🔍 Latest deploy log	https://app.netlify.com/sites/docsite-preview/deploys/62551c44eb1f5c0008b7f75f
😎 Deploy Preview	https://deploy-preview-4650--docsite-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

k-ye · 2022-03-29T15:14:46Z

/format

quadpixels · 2022-03-31T09:23:08Z

I just found SPIRV_Cross_NumWorkgroups takes up constant buffer slot 1 (while args_t takes slot 0)

The slot is not explicitly mentioned with the "register" keyword in the HLSL source (if it does, it will be register(b1))

struct SPIRV_Cross_Input
{
    uint3 gl_GlobalInvocationID : SV_DispatchThreadID;
};

To compare this is what slot 0 looks like:

cbuffer args_t : register(b0)
{
    int args_arg0 : packoffset(c0);
    int args_extra_args0 : packoffset(c0.y);
    ...
    int args_extra_args127 : packoffset(c32);
};

Adding this allows the to_numpy kernel to work, otherwise the kernel will freeze and cause a TDR timeout 😀

I guess I may do some cleanup and add that to this pull request as well since it's not a large change.

==================================

Edit: the CB slot for SPIRV_Cross_NumWorkgroups should be determined by the "watermark" of the CBs generated by Taichi, this depends on "how many CBs are used until the previous dispatch()"

Example:

>> 6E368600 run_commands, 8 commands
  bind pipeline paint_c56_0_k0001_vk_0_t00
  bind UAV 1819636583 to slot u2
  dispatch spirv_cross_numworkgroups = b0        <---- (1)
  bind pipeline vector_to_image_c16_0_k0002_vk_0_t00
  bind UAV 1819636583 to slot u2
  bind UAV 1819636595 to slot u3
  bind CBV 1819636594 to slot b0
  dispatch spirv_cross_numworkgroups = b1        <---- (2)
<< 6E368600 run_commands

(1) should be b0 b/c the "paint" kernel has no CB. (2) should be b1 b/c the "vector_to_image" kernel uses cb0

quadpixels · 2022-04-05T05:16:39Z

/format

taichi/backends/dx/dx_device.cpp

taichi/codegen/spirv/spirv_ir_builder.cpp

bobcao3

LGTM!

Previsouly, SPIRV_Cross_NumWorkgroups is fixed at constant buffer 1 (b1). This is not correct and actually should be set to "1 past the CBs used by the kernel", which can be b0 or b1. Tested: Can run "simple_uv.py" and simplified "mpm88.py"

Previously, init_random_function uses an add operation to increment the pointer to the RNG state. This generates the following HLSL: global_tmps_buffer_u32.Store(4096, global_tmps_buffer_u32.Load(4096) + 1u); The HLSL compiler will complain with the following error: error X3694: race condition writing to shared resource detected, consider making this write conditional. Because of this, we use the atomic addition operation to achieve the same result. The generated HLSL now becomes: uint _67; global_tmps_buffer_u32.InterlockedAdd(4096, 1, _67);

for more information, see https://pre-commit.ci

quadpixels added 3 commits March 28, 2022 22:56

[dx11] Fix compile_to_executable

b00d86b

Change the parameter from SNodeTreeManager to SNodeTreeManager*.

quadpixels force-pushed the dx-spirv-for-pr5 branch 2 times, most recently from 65eefbf to fe6d454 Compare April 3, 2022 18:56

quadpixels changed the title ~~[dx11] constant buffer binding~~ [dx11] Constant buffer binding Apr 3, 2022

quadpixels force-pushed the dx-spirv-for-pr5 branch from fe6d454 to e9d02ee Compare April 3, 2022 22:23

quadpixels changed the title ~~[dx11] Constant buffer binding~~ [dx11] Constant buffer binding and AtomicIncrement in RAND_STATE Apr 4, 2022

quadpixels force-pushed the dx-spirv-for-pr5 branch from 721c3fe to 0281574 Compare April 5, 2022 05:22

bobcao3 reviewed Apr 12, 2022

View reviewed changes

taichi/backends/dx/dx_device.cpp Outdated Show resolved Hide resolved

bobcao3 reviewed Apr 12, 2022

View reviewed changes

taichi/codegen/spirv/spirv_ir_builder.cpp Show resolved Hide resolved

bobcao3 approved these changes Apr 12, 2022

View reviewed changes

quadpixels force-pushed the dx-spirv-for-pr5 branch from c01c028 to 1007653 Compare April 12, 2022 06:15

quadpixels force-pushed the dx-spirv-for-pr5 branch from 9a3ed6e to 86c66e0 Compare April 12, 2022 06:28

[pre-commit.ci] auto fixes from pre-commit.com hooks

5c4857d

for more information, see https://pre-commit.ci

bobcao3 merged commit ff82525 into taichi-dev:master Apr 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

quadpixels commented Mar 29, 2022 •

edited

Loading

netlify bot commented Mar 29, 2022 •

edited

Loading

k-ye commented Mar 29, 2022

quadpixels commented Mar 31, 2022 •

edited

Loading

quadpixels commented Apr 5, 2022

bobcao3 left a comment

[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

Conversation

quadpixels commented Mar 29, 2022 • edited Loading

netlify bot commented Mar 29, 2022 • edited Loading

✅ Deploy Preview for docsite-preview ready!

k-ye commented Mar 29, 2022

quadpixels commented Mar 31, 2022 • edited Loading

quadpixels commented Apr 5, 2022

bobcao3 left a comment

Choose a reason for hiding this comment

quadpixels commented Mar 29, 2022 •

edited

Loading

netlify bot commented Mar 29, 2022 •

edited

Loading

quadpixels commented Mar 31, 2022 •

edited

Loading