Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeviceMacroProperty::operator+= does not compiler for SM < 60 #847

Closed
ptheywood opened this issue May 9, 2022 · 1 comment · Fixed by #848
Closed

DeviceMacroProperty::operator+= does not compiler for SM < 60 #847

ptheywood opened this issue May 9, 2022 · 1 comment · Fixed by #848
Labels

Comments

@ptheywood
Copy link
Member

DeviceMacroProperty<T, I, J, K, W>::operator+=(const T& val) uses atomicAdd(this->ptr, val); internally.

For double precision floating point numbers, atomicAdd is only implemented in CUDA for compute capability 60 devices (pascal) and newer, where the underlying hardware instruction was first implemented.
This also requires CUDA >= ~8 IIRC, but that is not an issue for us.

The CUDA documentation includes a reference implementation of atomicAdd(double*, double) using atomicCAS. This is much, much slower than the hardware instruction (especially when there is high atomic contention) but it is the only way to implement it for SM < 60.

E.g. from the CUDA 11.6 Documenation B.14:

#if __CUDA_ARCH__ < 600
__device__ double atomicAdd(double* address, double val)
{
    unsigned long long int* address_as_ull =
                              (unsigned long long int*)address;
    unsigned long long int old = *address_as_ull, assumed;

    do {
        assumed = old;
        old = atomicCAS(address_as_ull, assumed,
                        __double_as_longlong(val +
                               __longlong_as_double(assumed)));

    // Note: uses integer comparison to avoid hang in case of NaN (since NaN != NaN)
    } while (assumed != old);

    return __longlong_as_double(old);
}
#endif

If the test suite had included use of doubles here, this would have been caught by CI.

I've lazily added this to the DeviceMacroPropertyTest.add test in the macroprop-addfp64 branch to demonstate this (logs, valid for 90 days?), which when targetting SM < 60 produces an error such as:

FLAMEGPU2/include/flamegpu/runtime/utility/DeviceMacroProperty.cuh(274): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, const double)
          detected during instantiation of "flamegpu::DeviceMacroProperty<T, I, J, K, W> &flamegpu::DeviceMacroProperty<T, I, J, K, W>::operator+=(const T &) [with T=double, I=1U, J=1U, K=1U, W=1U]" 
FLAMEGPU2/tests/test_cases/runtime/test_device_macro_property.cu(95): here

1 error detected in the compilation of "FLAMEGPU2/tests/test_cases/runtime/test_device_macro_property.cu"

We could just drop the reference implementation into the DeviceMacroProperty header outside of the flamegpu namespace, but if this is done anywhere else it would be multiply defined.
Usign the anon namespace instead would allow this to coexist with other implementations, but would require it a little bit of macro use. I'm not sure which would be the cleaner solution.

@ptheywood ptheywood added the bug label May 9, 2022
@Robadob
Copy link
Member

Robadob commented May 9, 2022

If it's marked inline (or forceinline), multiply defined isn't a problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants