Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dx11] Constant buffer binding and AtomicIncrement in RAND_STATE #4650

Merged
merged 6 commits into from
Apr 13, 2022

Conversation

quadpixels
Copy link
Contributor

@quadpixels quadpixels commented Mar 29, 2022

Related issue = #992

This makes a bit more progress for the dx11 backend, it now runs the mpm88.py program. This change has to deal with a few updates in the Taichi infrastructure:

  • SPIRV-cross now differentiates read-only buffers (ResourceBinder::buffer) from read-write buffers (ResourceBinder::rw_buffer) differently, so the backend has to handle Constant Buffers in addition to Unordered Access Views.

  • SPIRV-cross now uses a different set of default options, so the backend needs to call remap_num_workgroups_builtin.

  • Using spv::Op::OpAtomicIIncrement for the random number generator state increment, so that the generated HLSL will not cause the HLSL compiler to generate an error

Tested: runs mpm88.py

Change the parameter from SNodeTreeManager to SNodeTreeManager*.
1. Change Dx11Pipeline::device_ from a shared_ptr to a raw pointer
Previously, Dx11Pipeline::device_ was a shared_ptr.
This can cause the device_ to be destructed twice when the program
finishes.
Because the pipeline is created by Dx11Device::create_pipeline,
logically speaking the device owns the pipeline, rather than the
pipeline owning the device. So it makes sense if the pipeline is not
tracking the life time of its Dx11Device object, thus switching to a
raw pointer.

2. Call hlsl.remap_num_workgroups_builtin()
This fixes the "RuntimeError: NumWorkgroups builtin is used, but
remap_num_workgroups_builtin() was not called. Cannot emit code
for this builtin." error.

3. This change also adds a few other Dx11CommandList commands.
It turns out the SPIRV code generator has started to use constant
buffers now, and because of this, we must differentiate buffer() and
rw_buffer(). With SPIRV-HLSL, buffer() populates constant buffers, while
rw_buffer() populates unordered access views (UAVs). This change
makes the necessary changes.

Tested: can run "laplace.py"
@netlify
Copy link

netlify bot commented Mar 29, 2022

Deploy Preview for docsite-preview ready!

Name Link
🔨 Latest commit 5c4857d
🔍 Latest deploy log https://app.netlify.com/sites/docsite-preview/deploys/62551c44eb1f5c0008b7f75f
😎 Deploy Preview https://deploy-preview-4650--docsite-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k-ye
Copy link
Member

k-ye commented Mar 29, 2022

/format

@quadpixels
Copy link
Contributor Author

quadpixels commented Mar 31, 2022

I just found SPIRV_Cross_NumWorkgroups takes up constant buffer slot 1 (while args_t takes slot 0)

The slot is not explicitly mentioned with the "register" keyword in the HLSL source (if it does, it will be register(b1))

struct SPIRV_Cross_Input
{
    uint3 gl_GlobalInvocationID : SV_DispatchThreadID;
};

To compare this is what slot 0 looks like:

cbuffer args_t : register(b0)
{
    int args_arg0 : packoffset(c0);
    int args_extra_args0 : packoffset(c0.y);
    ...
    int args_extra_args127 : packoffset(c32);
};

Adding this allows the to_numpy kernel to work, otherwise the kernel will freeze and cause a TDR timeout 😀

I guess I may do some cleanup and add that to this pull request as well since it's not a large change.

==================================

Edit: the CB slot for SPIRV_Cross_NumWorkgroups should be determined by the "watermark" of the CBs generated by Taichi, this depends on "how many CBs are used until the previous dispatch()"

Example:

>> 6E368600 run_commands, 8 commands
  bind pipeline paint_c56_0_k0001_vk_0_t00
  bind UAV 1819636583 to slot u2
  dispatch spirv_cross_numworkgroups = b0        <---- (1)
  bind pipeline vector_to_image_c16_0_k0002_vk_0_t00
  bind UAV 1819636583 to slot u2
  bind UAV 1819636595 to slot u3
  bind CBV 1819636594 to slot b0
  dispatch spirv_cross_numworkgroups = b1        <---- (2)
<< 6E368600 run_commands

(1) should be b0 b/c the "paint" kernel has no CB. (2) should be b1 b/c the "vector_to_image" kernel uses cb0

@quadpixels quadpixels force-pushed the dx-spirv-for-pr5 branch 2 times, most recently from 65eefbf to fe6d454 Compare April 3, 2022 18:56
@quadpixels quadpixels changed the title [dx11] constant buffer binding [dx11] Constant buffer binding Apr 3, 2022
@quadpixels quadpixels changed the title [dx11] Constant buffer binding [dx11] Constant buffer binding and AtomicIncrement in RAND_STATE Apr 4, 2022
@quadpixels
Copy link
Contributor Author

/format

Copy link
Collaborator

@bobcao3 bobcao3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Previsouly, SPIRV_Cross_NumWorkgroups is fixed at constant buffer 1
(b1). This is not correct and actually should be set to "1 past the CBs
used by the kernel", which can be b0 or b1.

Tested: Can run "simple_uv.py" and simplified "mpm88.py"
Previously, init_random_function uses an add operation to increment the
pointer to the RNG state. This generates the following HLSL:

global_tmps_buffer_u32.Store(4096, global_tmps_buffer_u32.Load(4096) + 1u);

The HLSL compiler will complain with the following error:
error X3694: race condition writing to shared resource detected, consider making this write conditional.

Because of this, we use the atomic addition operation to achieve the
same result. The generated HLSL now becomes:
uint _67;
global_tmps_buffer_u32.InterlockedAdd(4096, 1, _67);
@bobcao3 bobcao3 merged commit ff82525 into taichi-dev:master Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants