Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interlocked on floats (at least Max/Min) #29

Open
Nielsbishere opened this issue Jan 3, 2023 · 4 comments
Open

Interlocked on floats (at least Max/Min) #29

Nielsbishere opened this issue Jan 3, 2023 · 4 comments

Comments

@Nielsbishere
Copy link

Is your feature request related to a problem? Please describe.
With compute shaders, atomics have become a big part of shaders and the lack of float support can be quite annoying sometimes. I understand that atomic add on a float might be hard because of possible hardware support, but min/max could be supported in software too. Atomic min/max would essentially be possible by doing the following conversion before min/max: u = asuint(f); u = u >> 31 ? ~u : (u | (1 << 31)). Then doing atomics using u and reversing it when reading back. Even though this is possible manually, it'd be great if the language supports this without floating point trickery.

Describe the solution you'd like
Support for interlocked floats, most importantly min/max. Add would be nice too, but understandable if it can't be supported. Float atomics have existed in OpenGL GLSL with an NV specific extension.

Describe alternatives you've considered
Doing it manually, but might be an obstacle for new programmers.

Additional context
N.A.

@devshgraphicsprogramming

btw you can emit the SPIR-V Opcode you need directly via Inline SPIR-V already present in DXC

@Nielsbishere
Copy link
Author

Nielsbishere commented Sep 7, 2023

@devshgraphicsprogramming Yes, if you're using Vulkan you can enable the VK_EXT_shader_atomic_float if supported. DirectX12 to my knowledge doesn't support it (because DXIL doesn't). (Also OpenGL does support it with an NV extension)

As a sidenote; the solution I provided doesn't work with NaNs because NaNs always return false when compared, while my hack would assume NaNs are real numbers so they'd get turned into something bigger than inf. So just don't throw in a NaN and it should be fine :). As for correct IEEE754 behavior for NaN: min is: a = a < b ? a : b; Then for a NaN as 'a' that'd return b always and for max too. Even if it's a NaN passed as b, it will always be returned. So imo ignoring the NaN in the min/max is the IEEE754 comformant-ish way to do it (except if it's the min/max are only performed on NaNs).
I saw @jeremyong liked this issue and also made a good blogpost explaining this in further detail at https://www.jeremyong.com/graphics/2023/09/05/f32-interlocked-min-max-hlsl/. Just to note, this solution does still work on halfs and doubles as well; but the sign check (shift) and mask should be corrected to 15 or 63 respectively of course. In fact it works with any IEEE754 compliant format that puts the sign bit, then exponent and then mantissa in that order. N = mantissa + exponent (15 = half, 31 = float, 63 = double). (Though uint64_t atomics would be needed for doubles and 1 << N should be a uint64_t shifted first too)

@devshgraphicsprogramming

just to clarify I'm not trivializing or against this being implemented in HLSL/DXC and added to DXIL, just giving pointers if you want to achieve this "today" in SPIR-V env.

@devshgraphicsprogramming
Copy link

devshgraphicsprogramming commented Sep 8, 2023

As a sidenote; the solution I provided doesn't work with NaNs because NaNs always return false when compared, while my hack would assume NaNs are real numbers so they'd get turned into something bigger than inf. So just don't throw in a NaN and it should be fine :). As for correct IEEE754 behavior for NaN: min is: a = a < b ? a : b; Then for a NaN as 'a' that'd return b always and for max too. Even if it's a NaN passed as b, it will always be returned. So imo ignoring the NaN in the min/max is the IEEE754 comformant-ish way to do it (except if it's the min/max are only performed on NaNs). I saw @jeremyong liked this issue and also made a good blogpost explaining this in further detail at https://www.jeremyong.com/graphics/2023/09/05/f32-interlocked-min-max-hlsl/. Just to note, this solution does still work on halfs and doubles as well; but the sign check (shift) and mask should be corrected to 15 or 63 respectively of course. In fact it works with any IEEE754 compliant format that puts the sign bit, then exponent and then mantissa in that order. N = mantissa + exponent (15 = half, 31 = float, 63 = double). (Though uint64_t atomics would be needed for doubles and 1 << N should be a uint64_t shifted first too)

This is obviously a really cool technique.

As a sidenote to a sidenote, we can do all sorts of "atomic" operations with CAS loops but for this and for the trick above it gets really nasty to maintain the code that does this without having any sort of reference type, right now you cam abuse the bug that #5377 will fix soon and have

template<typename T>
T myEsotericAtomic(inout T rval, in T operand);

and do whatever you like inside (CAS loop, call to an inline SPIR-V instrinsic), plus have this work both with groupshared, RWStructuredBuffer, and if new BDA ships before microsoft/DirectXShaderCompiler#5377 even with the result of vk::BufferPointer::Get().

Once #5377 ships before we're given true T& references or some sort of a crutch, the only way to implement this is with a macro, even if you use an accessor pseudo-lambda/functor-struct simply because you'll have to copy&paste the code into the method definition a million times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triaged
Development

No branches or pull requests

4 participants