DeviceMacroProperty::operator+=
does not compiler for SM < 60
#847
Labels
DeviceMacroProperty::operator+=
does not compiler for SM < 60
#847
DeviceMacroProperty<T, I, J, K, W>::operator+=(const T& val)
usesatomicAdd(this->ptr, val);
internally.For
double
precision floating point numbers,atomicAdd
is only implemented in CUDA for compute capability 60 devices (pascal) and newer, where the underlying hardware instruction was first implemented.This also requires CUDA >= ~8 IIRC, but that is not an issue for us.
The CUDA documentation includes a reference implementation of
atomicAdd(double*, double)
usingatomicCAS
. This is much, much slower than the hardware instruction (especially when there is high atomic contention) but it is the only way to implement it for SM < 60.E.g. from the CUDA 11.6 Documenation B.14:
If the test suite had included use of doubles here, this would have been caught by CI.
I've lazily added this to the
DeviceMacroPropertyTest.add
test in themacroprop-addfp64
branch to demonstate this (logs, valid for 90 days?), which when targetting SM < 60 produces an error such as:We could just drop the reference implementation into the
DeviceMacroProperty
header outside of theflamegpu
namespace, but if this is done anywhere else it would be multiply defined.Usign the anon namespace instead would allow this to coexist with other implementations, but would require it a little bit of macro use. I'm not sure which would be the cleaner solution.
The text was updated successfully, but these errors were encountered: