Interlocked on floats (at least Max/Min) #29

Nielsbishere · 2023-01-03T10:54:32Z

Is your feature request related to a problem? Please describe.
With compute shaders, atomics have become a big part of shaders and the lack of float support can be quite annoying sometimes. I understand that atomic add on a float might be hard because of possible hardware support, but min/max could be supported in software too. Atomic min/max would essentially be possible by doing the following conversion before min/max: u = asuint(f); u = u >> 31 ? ~u : (u | (1 << 31)). Then doing atomics using u and reversing it when reading back. Even though this is possible manually, it'd be great if the language supports this without floating point trickery.

Describe the solution you'd like
Support for interlocked floats, most importantly min/max. Add would be nice too, but understandable if it can't be supported. Float atomics have existed in OpenGL GLSL with an NV specific extension.

Describe alternatives you've considered
Doing it manually, but might be an obstacle for new programmers.

Additional context
N.A.

devshgraphicsprogramming · 2023-09-06T15:50:18Z

btw you can emit the SPIR-V Opcode you need directly via Inline SPIR-V already present in DXC

Nielsbishere · 2023-09-07T22:40:19Z

@devshgraphicsprogramming Yes, if you're using Vulkan you can enable the VK_EXT_shader_atomic_float if supported. DirectX12 to my knowledge doesn't support it (because DXIL doesn't). (Also OpenGL does support it with an NV extension)

As a sidenote; the solution I provided doesn't work with NaNs because NaNs always return false when compared, while my hack would assume NaNs are real numbers so they'd get turned into something bigger than inf. So just don't throw in a NaN and it should be fine :). As for correct IEEE754 behavior for NaN: min is: a = a < b ? a : b; Then for a NaN as 'a' that'd return b always and for max too. Even if it's a NaN passed as b, it will always be returned. So imo ignoring the NaN in the min/max is the IEEE754 comformant-ish way to do it (except if it's the min/max are only performed on NaNs).
I saw @jeremyong liked this issue and also made a good blogpost explaining this in further detail at https://www.jeremyong.com/graphics/2023/09/05/f32-interlocked-min-max-hlsl/. Just to note, this solution does still work on halfs and doubles as well; but the sign check (shift) and mask should be corrected to 15 or 63 respectively of course. In fact it works with any IEEE754 compliant format that puts the sign bit, then exponent and then mantissa in that order. N = mantissa + exponent (15 = half, 31 = float, 63 = double). (Though uint64_t atomics would be needed for doubles and 1 << N should be a uint64_t shifted first too)

devshgraphicsprogramming · 2023-09-08T09:40:36Z

just to clarify I'm not trivializing or against this being implemented in HLSL/DXC and added to DXIL, just giving pointers if you want to achieve this "today" in SPIR-V env.

devshgraphicsprogramming · 2023-09-08T09:48:11Z

As a sidenote; the solution I provided doesn't work with NaNs because NaNs always return false when compared, while my hack would assume NaNs are real numbers so they'd get turned into something bigger than inf. So just don't throw in a NaN and it should be fine :). As for correct IEEE754 behavior for NaN: min is: a = a < b ? a : b; Then for a NaN as 'a' that'd return b always and for max too. Even if it's a NaN passed as b, it will always be returned. So imo ignoring the NaN in the min/max is the IEEE754 comformant-ish way to do it (except if it's the min/max are only performed on NaNs). I saw @jeremyong liked this issue and also made a good blogpost explaining this in further detail at https://www.jeremyong.com/graphics/2023/09/05/f32-interlocked-min-max-hlsl/. Just to note, this solution does still work on halfs and doubles as well; but the sign check (shift) and mask should be corrected to 15 or 63 respectively of course. In fact it works with any IEEE754 compliant format that puts the sign bit, then exponent and then mantissa in that order. N = mantissa + exponent (15 = half, 31 = float, 63 = double). (Though uint64_t atomics would be needed for doubles and 1 << N should be a uint64_t shifted first too)

This is obviously a really cool technique.

As a sidenote to a sidenote, we can do all sorts of "atomic" operations with CAS loops but for this and for the trick above it gets really nasty to maintain the code that does this without having any sort of reference type, right now you cam abuse the bug that #5377 will fix soon and have

template<typename T>
T myEsotericAtomic(inout T rval, in T operand);

and do whatever you like inside (CAS loop, call to an inline SPIR-V instrinsic), plus have this work both with groupshared, RWStructuredBuffer, and if new BDA ships before microsoft/DirectXShaderCompiler#5377 even with the result of vk::BufferPointer::Get().

Once #5377 ships before we're given true T& references or some sort of a crutch, the only way to implement this is with a macro, even if you use an accessor pseudo-lambda/functor-struct simply because you'll have to copy&paste the code into the method definition a million times.

llvm-beanz added the needs-triage label Jun 29, 2023

github-project-automation bot added this to HLSL Triage Jun 29, 2023

damyanp removed the needs-triage label May 1, 2024

damyanp moved this to Triaged in HLSL Triage May 1, 2024

damyanp added this to the Shader Model Backlog milestone May 1, 2024

AsherJingkongChen mentioned this issue Oct 13, 2024

feat: Add 32-bit floating-point atomics (SHADER_FLOAT32_ATOMIC) gfx-rs/wgpu#6234

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interlocked on floats (at least Max/Min) #29

Interlocked on floats (at least Max/Min) #29

Nielsbishere commented Jan 3, 2023

devshgraphicsprogramming commented Sep 6, 2023

Nielsbishere commented Sep 7, 2023 •

edited

Loading

devshgraphicsprogramming commented Sep 8, 2023

devshgraphicsprogramming commented Sep 8, 2023 •

edited

Loading

Interlocked on floats (at least Max/Min) #29

Interlocked on floats (at least Max/Min) #29

Comments

Nielsbishere commented Jan 3, 2023

devshgraphicsprogramming commented Sep 6, 2023

Nielsbishere commented Sep 7, 2023 • edited Loading

devshgraphicsprogramming commented Sep 8, 2023

devshgraphicsprogramming commented Sep 8, 2023 • edited Loading

Nielsbishere commented Sep 7, 2023 •

edited

Loading

devshgraphicsprogramming commented Sep 8, 2023 •

edited

Loading