Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2-stage GS update breaking cuda/10+rdc build #673

Closed
ndellingwood opened this issue Apr 2, 2020 · 4 comments
Closed

2-stage GS update breaking cuda/10+rdc build #673

ndellingwood opened this issue Apr 2, 2020 · 4 comments

Comments

@ndellingwood
Copy link
Contributor

Merge of PR #671 led to the cuda/10+rdc nightly build (Pascal arch) failing on White.

From a glance at the output there are various undefined references, possibly a sign of ETI-related issues when using sptrsv within the gs routines. Sptrsv tests were compiling and running without error prior to the PR, so I'm not sure what is causing the issue with it's usage in GS when rdc is enabled.

Sample errors from Jenkins job:
https://jenkins-son.sandia.gov/job/KokkosKernels_White_CudaSerial_cuda_10_gcc_720_pascal_rdc/36/consoleFull

...
01:16:19 ../../src
/libkokkoskernels.a(Sparse_gauss_seidel_symbolic_eti_DOUBLE_ORDINAL_INT_OFFSET_INT_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDASPACE_MEMSPACE_CUDAUVMSPACE.cpp.o): In function `void KokkosSparse::Experimental::sptrsv_symbolic<KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Cuda, Kokkos::CudaSpace, Kokkos::CudaUVMSpace>, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<0u> > >(KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Cuda, Kokkos::CudaSpace, Kokkos::CudaUVMSpace>*, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<0u> >, Kokkos::View<int*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<0u> >)':
01:16:19 
tmpxft_0000c000_00000000-5_Sparse_gauss_seidel_symbolic_eti_DOUBLE_ORDINAL_INT_OFFSET_INT_LAYOUTLEFT_EXECSPACE_CUDA_MEMSPACE_CUDAUVMSPACE_MEMSPACE_CUDASPACE.cudafe1.cpp:(.text._ZN12KokkosSparse12Experimental15sptrsv_symbolicIN13KokkosKernels12Experimental19KokkosKernelsHandleIKiS5_KdN6Kokkos4CudaENS7_12CudaUVMSpaceENS7_9CudaSpaceEEENS7_4ViewIPiJNS7_10LayoutLeftENS7_6DeviceIS8_SA_EENS7_12MemoryTraitsILj0EEEEEESJ_EEvPT_T0_T1_[_ZN12KokkosSparse12Experimental15sptrsv_symbolicIN13KokkosKernels12Experimental19KokkosKernelsHandleIKiS5_KdN6Kokkos4CudaENS7_12CudaUVMSpaceENS7_9CudaSpaceEEENS7_4ViewIPiJNS7_10LayoutLeftENS7_6DeviceIS8_SA_EENS7_12MemoryTraitsILj0EEEEEESJ_EEvPT_T0_T1_]+0xf8): undefined reference to `KokkosSparse::Impl::SPTRSV_SYMBOLIC<KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Cuda, Kokkos::CudaUVMSpace, Kokkos::CudaSpace>, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<3u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<3u> >, false, false>::sptrsv_symbolic(KokkosKernels::Experimental::KokkosKernelsHandle<int const, int const, double const, Kokkos::Cuda, Kokkos::CudaUVMSpace, Kokkos::CudaSpace>*, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<3u> >, Kokkos::View<int const*, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Cuda, Kokkos::CudaSpace>, Kokkos::MemoryTraits<3u> >)'

Reproducer instructions:

  #   Load modules:
        module load cmake/3.12.3 cuda/10.1.105 gcc/7.2.0 ibm/xl/16.1.0

  #   Use generate_makefile line below to call cmake which generates makefile for this build:
        $KOKKOSKERNELS_PATH/cm_generate_makefile.bash --with-devices=Cuda,Serial --arch=Power8,Pascal60 --compiler=$KOKKOS_PATH/bin/nvcc_wrapper --cxxflags="-O3 -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="11" --ldflags="" --with-cuda --kokkos-path=$KOKKOS_PATH --kokkoskernels-path=$KOKKOSKERNELS_PATH --with-scalars='' --with-ordinals= --with-offsets= --with-layouts= --with-tpls=    --no-examples --with-options=enable_large_mem_tests

@iyamazaki @brian-kelley can you take a look at this?

@brian-kelley
Copy link
Contributor

RIDE and White seem to be down now, but I'm trying to replicate on my machine (still CUDA 10 and pascal).

@ndellingwood
Copy link
Contributor Author

Thanks @brian-kelley !

@iyamazaki
Copy link
Contributor

Thank you @ndellingwood I'll take a look!!

@brian-kelley
Copy link
Contributor

brian-kelley commented Apr 2, 2020

@ndellingwood I can confirm that #675 fixed the build with RDC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants