Kokkos::ArithTraits<double>::nan() is very slow #35

ibaned · 2017-07-06T23:07:46Z

I just got a report from Albany users (@lxmota and @calleman21). They switched from Teuchos::ScalarTraits<double>::nan() to Kokkos::ArithTraits<double>::nan() which slowed down their entire application by over 3X because all variables are initialized to NaN using this function.

KokkosKernels uses strtod() to implement this function (on the host), while Teuchos returns a global variable which is initialized to 0.0/0.0. @lxmota also recommended that we simply call std::numeric_traits<double>::quiet_NaN().

All the above also applies to float.

I think we should switch to either what Teuchos does or the quiet_NaN() from the standard library.

@crtrott @mhoemmen any thoughts?

I'll pick one of these and submit a PR soon.

The text was updated successfully, but these errors were encountered:

mhoemmen · 2017-07-07T03:18:21Z

Kokkos::ArithTraits needs to work on the GPU, so you can't use std::numeric_limits if __CUDA_ARCH__ is defined. It could be that this works now, given how CUDA's constexpr support has been evolving. However, you'll have to try it to be sure.

If std::numeric_limits doesn't work, you could try this:
https://stackoverflow.com/questions/15514286/way-to-get-floating-point-special-values-in-cuda

ibaned · 2017-07-07T14:04:09Z

Actually, this is the code right now, so the CUDA part is already separate:

#ifdef __CUDA_ARCH__
    return CUDART_NAN;
    //return nan (); // this returns 0 ???
#else
    // http://pubs.opengroup.org/onlinepubs/009696899/functions/nan.html
    return strtod ("NAN", (char**) NULL);
#endif // __CUDA_ARCH__

So we only have to worry about the slow host code, which has no GPU restrictions.

hcedwar · 2017-07-07T14:33:10Z

Suggest the strtod only be called once to initialize a singleton:

#ifdef __CUDA_ARCH__
__device__ inline
double kokkos_kernels_nan_function() // whatever 
{  return CUDART_NAN ; }
#else
inline
double kokkos_kernels_nan_function() // whatever 
{ static double x = strtod("NAN",(char**)NULL); return x ; }
#endif

ibaned · 2017-07-07T14:37:36Z

This is quite similar to what Teuchos does. They also use a singleton, just a different initialization.

Any reason not to use the numeric_limits quiet_NAN to initialize the singleton instead of strtod ?

ibaned · 2017-07-07T14:40:45Z

Actually, std::numeric_limits<double>::quiet_NaN() is constexpr in C++11, so calling it directly might be faster than loading a singleton in that case.

crtrott · 2017-07-07T15:37:19Z

Just try out what is fastest and do that. Though I hope its the std::numeric_limits solution ;-)

ibaned · 2017-07-07T15:39:22Z

Could someone add me to this repository so I can assign this to myself?

crtrott · 2017-07-07T15:50:46Z

Ah this is Kokkos-Kernels totally overlooked this.

mhoemmen · 2017-07-07T16:47:31Z

std::numeric_limits should work. We could even make ArithTraits::nan constexpr in the non-CUDA case (and eventually, even in the CUDA case, since CUDA looks like it's starting to get more constexpr support).

Singletons could be bad if calling this in a thread-parallel context; we would want pthread_once in that case.

using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos/kokkos-kernels#35

using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos/kokkos-kernels#35 This is being patched directly into Trilinos because (1) the performance issue is urgent and (2) kokkos-kernels is undergoing heavy refactoring and may not perform a formal snapshot for a couple of months.

using this instead of strtod() and friends dramatically speeds up the Kokkos::ArithTraits<T>::nan() function. The performance test Trilinos/packages/minitensor/test/perf_test_01.cc went from 5.5 seconds to 0.6 seconds. see kokkos#35

ibaned · 2017-07-11T14:25:15Z

trilinos/Trilinos#1490 fixes this in Trilinos, including the addition of a performance test in the affected MiniTensor package to measure the difference.

#36 fixes this in the KokkosKernels develop branch.

They are separate since KokkosKernels is unlikely to promote into Trilinos in the near term, and MiniTensor needs this now.

mhoemmen · 2017-07-11T15:41:11Z

I submitted a trilinos/Trilinos#1490 review; thanks!

ibaned self-assigned this Jul 7, 2017

ibaned mentioned this issue Jul 11, 2017

Fix Kokkos::ArithTraits<T>::nan() performance trilinos/Trilinos#1490

Merged

ibaned mentioned this issue Jul 11, 2017

use numeric_limits<T>::quiet_NaN #36

Merged

ibaned added the enhancement label Jul 11, 2017

crtrott added the InDevelop label Sep 8, 2017

crtrott closed this as completed Sep 8, 2017

This was referenced Jan 13, 2019

Made nan's quiet to hide FPEs trilinos/Trilinos#4180

Merged

New FPEs in Albany as of today sandialabs/Albany#422

Closed

This was referenced Jul 28, 2023

remove Intel 2017 code (no longer supported) #1920

Merged

set GENERATE_HTML = YES in Doxyfile #1921

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kokkos::ArithTraits<double>::nan() is very slow #35

Kokkos::ArithTraits<double>::nan() is very slow #35

ibaned commented Jul 6, 2017

mhoemmen commented Jul 7, 2017

ibaned commented Jul 7, 2017

hcedwar commented Jul 7, 2017

ibaned commented Jul 7, 2017

ibaned commented Jul 7, 2017

crtrott commented Jul 7, 2017 •

edited

Loading

ibaned commented Jul 7, 2017

crtrott commented Jul 7, 2017

mhoemmen commented Jul 7, 2017

ibaned commented Jul 11, 2017

mhoemmen commented Jul 11, 2017

Kokkos::ArithTraits<double>::nan() is very slow #35

Kokkos::ArithTraits<double>::nan() is very slow #35

Comments

ibaned commented Jul 6, 2017

mhoemmen commented Jul 7, 2017

ibaned commented Jul 7, 2017

hcedwar commented Jul 7, 2017

ibaned commented Jul 7, 2017

ibaned commented Jul 7, 2017

crtrott commented Jul 7, 2017 • edited Loading

ibaned commented Jul 7, 2017

crtrott commented Jul 7, 2017

mhoemmen commented Jul 7, 2017

ibaned commented Jul 11, 2017

mhoemmen commented Jul 11, 2017

crtrott commented Jul 7, 2017 •

edited

Loading