`DeviceMacroProperty::operator+=` does not compiler for SM < 60 #847

ptheywood · 2022-05-09T10:10:25Z

DeviceMacroProperty<T, I, J, K, W>::operator+=(const T& val) uses atomicAdd(this->ptr, val); internally.

For double precision floating point numbers, atomicAdd is only implemented in CUDA for compute capability 60 devices (pascal) and newer, where the underlying hardware instruction was first implemented.
This also requires CUDA >= ~8 IIRC, but that is not an issue for us.

The CUDA documentation includes a reference implementation of atomicAdd(double*, double) using atomicCAS. This is much, much slower than the hardware instruction (especially when there is high atomic contention) but it is the only way to implement it for SM < 60.

E.g. from the CUDA 11.6 Documenation B.14:

#if __CUDA_ARCH__ < 600
__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                              (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;

    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));

    // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN)
    } while (assumed != old);

    return __longlong_as_double(old);
}
#endif

If the test suite had included use of doubles here, this would have been caught by CI.

I've lazily added this to the DeviceMacroPropertyTest.add test in the macroprop-addfp64 branch to demonstate this (logs, valid for 90 days?), which when targetting SM < 60 produces an error such as:

FLAMEGPU2/include/flamegpu/runtime/utility/DeviceMacroProperty.cuh(274): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, const double)
          detected during instantiation of "flamegpu::DeviceMacroProperty<T, I, J, K, W> &flamegpu::DeviceMacroProperty<T, I, J, K, W>::operator+=(const T &) [with T=double, I=1U, J=1U, K=1U, W=1U]" 
FLAMEGPU2/tests/test_cases/runtime/test_device_macro_property.cu(95): here

1 error detected in the compilation of "FLAMEGPU2/tests/test_cases/runtime/test_device_macro_property.cu"

We could just drop the reference implementation into the DeviceMacroProperty header outside of the flamegpu namespace, but if this is done anywhere else it would be multiply defined.
Usign the anon namespace instead would allow this to coexist with other implementations, but would require it a little bit of macro use. I'm not sure which would be the cleaner solution.

The text was updated successfully, but these errors were encountered:

Robadob · 2022-05-09T13:01:32Z

If it's marked inline (or forceinline), multiply defined isn't a problem.

ptheywood added the bug label May 9, 2022

ptheywood mentioned this issue May 9, 2022

Fix DeviceMacroProperty::operator+=(double) for SM < 60 GPUs #848

Merged

mondus closed this as completed in #848 May 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DeviceMacroProperty::operator+=` does not compiler for SM < 60 #847

`DeviceMacroProperty::operator+=` does not compiler for SM < 60 #847

ptheywood commented May 9, 2022

Robadob commented May 9, 2022

DeviceMacroProperty::operator+= does not compiler for SM < 60 #847

DeviceMacroProperty::operator+= does not compiler for SM < 60 #847

Comments

ptheywood commented May 9, 2022

Robadob commented May 9, 2022

`DeviceMacroProperty::operator+=` does not compiler for SM < 60 #847

`DeviceMacroProperty::operator+=` does not compiler for SM < 60 #847