-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kokkos::ArithTraits<double>::nan() is very slow #35
Comments
If |
Actually, this is the code right now, so the CUDA part is already separate: #ifdef __CUDA_ARCH__
return CUDART_NAN;
//return nan (); // this returns 0 ???
#else
// http://pubs.opengroup.org/onlinepubs/009696899/functions/nan.html
return strtod ("NAN", (char**) NULL);
#endif // __CUDA_ARCH__ So we only have to worry about the slow host code, which has no GPU restrictions. |
Suggest the #ifdef __CUDA_ARCH__
__device__ inline
double kokkos_kernels_nan_function() // whatever
{ return CUDART_NAN ; }
#else
inline
double kokkos_kernels_nan_function() // whatever
{ static double x = strtod("NAN",(char**)NULL); return x ; }
#endif |
This is quite similar to what Teuchos does. They also use a singleton, just a different initialization. Any reason not to use the |
Actually, |
Just try out what is fastest and do that. Though I hope its the std::numeric_limits solution ;-) |
Could someone add me to this repository so I can assign this to myself? |
Ah this is Kokkos-Kernels totally overlooked this. |
Singletons could be bad if calling this in a thread-parallel context; we would want |
using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos/kokkos-kernels#35
using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos/kokkos-kernels#35 This is being patched directly into Trilinos because (1) the performance issue is urgent and (2) kokkos-kernels is undergoing heavy refactoring and may not perform a formal snapshot for a couple of months.
using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test Trilinos/packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos#35
trilinos/Trilinos#1490 fixes this in Trilinos, including the addition of a performance test in the affected MiniTensor package to measure the difference. #36 fixes this in the KokkosKernels develop branch. They are separate since KokkosKernels is unlikely to promote into Trilinos in the near term, and MiniTensor needs this now. |
I submitted a trilinos/Trilinos#1490 review; thanks! |
I just got a report from Albany users (@lxmota and @calleman21). They switched from
Teuchos::ScalarTraits<double>::nan()
toKokkos::ArithTraits<double>::nan()
which slowed down their entire application by over 3X because all variables are initialized to NaN using this function.KokkosKernels uses
strtod()
to implement this function (on the host), while Teuchos returns a global variable which is initialized to0.0/0.0
. @lxmota also recommended that we simply callstd::numeric_traits<double>::quiet_NaN()
.All the above also applies to
float
.I think we should switch to either what Teuchos does or the
quiet_NaN()
from the standard library.@crtrott @mhoemmen any thoughts?
I'll pick one of these and submit a PR soon.
The text was updated successfully, but these errors were encountered: