Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial implementation of new ETI #526

Merged
merged 3 commits into from
Mar 17, 2020

Conversation

jjwilke
Copy link
Contributor

@jjwilke jjwilke commented Dec 13, 2019

No description provided.

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 13, 2019

  • Remove huge number of cpp files
  • Use CMake system to generate single source file per kernel
  • Makefile system will no longer work

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 13, 2019

New PR instead of canceled #526

@srajama1 srajama1 requested a review from crtrott December 13, 2019 05:57
@srajama1
Copy link
Contributor

@jjwilke : This is great ! Deleting 6000 files from the repo, what else can we ask for :)

@crtrott : Can you review this, please ?

@ndellingwood
Copy link
Contributor

Please make sure to test with Trilinos as well as the cm_test_all_sandia on various testbeds before considering a merge.

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 13, 2019

Testing to-dos:

  • cm_test_all_sandia on kokkos-dev-2
    • cuda/10.1 issues, but other CUDA tests fine
    • sparse_serial failed, then passed for pthread build: race condition?
    • [RESOLVED] GCC Pthread builds are very fast with new ETI. CUDA builds are VERY slow. I wonder if nvcc is not scaling well with the number of template instantiations per file and this would get faster if we generated more ETI files with fewer instantiations.
  • cm_test_all_sandia on blake
  • cm_test_all_sandia on mayer
  • Trilinos promotion test with kokkos/develop on waterman with CUBLAS/CUSPARSE: handful of tests failing
    • 65 - TpetraCore_Core_initialize_where_user_initializes_mpi_MPI_4 (Failed)
    • 66 - TpetraCore_Core_ScopeGuard_where_user_initializes_mpi_MPI_4 (Failed)
    • 69 - TpetraCore_Core_initialize_where_tpetra_initializes_kokkos_MPI_1 (Failed)
    • 70 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_kokkos_MPI_1 (Failed)
    • 71 - TpetraCore_Core_initialize_where_user_initializes_kokkos_MPI_1 (Failed)
    • 72 - TpetraCore_Core_ScopeGuard_where_user_initializes_kokkos_MPI_1 (Failed)
    • 73 - TpetraCore_Core_initialize_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)
    • 74 - TpetraCore_Core_ScopeGuard_where_tpetra_initializes_mpi_and_user_initializes_kokkos_MPI_2 (Failed)
    • 93 - TpetraCore_CrsMatrix_UnitTests4_MPI_4 (Failed)
    • 180 - TpetraCore_MatrixMatrix_UnitTests_MPI_4 (Failed)
  • Trilinos promotion test with kokkos/develop on blake (handful of tests failing). The exact same tests are failing on develop - so I don't this this is an ETI thing.
    • 28 - TpetraTSQR_Combine_MPI_1 (Failed)
    • 29 - TpetraTSQR_SequentialTsqr_contiguousCacheBlocks_MPI_1 (Failed)
    • 30 - TpetraTSQR_SequentialTsqr_noncontiguousCacheBlocks_MPI_1 (Failed)
    • 31 - TpetraTSQR_KokkosHostTsqr_MPI_1 (Failed)
    • 34 - TpetraTSQR_FullTsqr_Accuracy_MPI_4 (Failed)
    • 87 - TpetraCore_CrsMatrix_UnitTests4_MPI_4 (Failed)
  • Trilinos promotion test with kokkos/develop on waterman without CUBLAS/CUSPARSE: handful of tests failing
    • 93 - TpetraCore_CrsMatrix_UnitTests4_MPI_4 (Failed)
    • 180 - TpetraCore_MatrixMatrix_UnitTests_MPI_4 (Failed)

SET(EXECSPACE_SERIAL_VALID_MEM_SPACES HBWSPACE HOSTSPACE CUDAUVMSPACE)
SET(EXECSPACE_OPENMP_VALID_MEM_SPACES HBWSPACE HOSTSPACE CUDAUVMSPACE)
SET(EXECSPACE_PTHREAD_VALID_MEM_SPACES HBWSPACE HOSTSPACE CUDAUVMSPACE)
SET(EXECSPACE_SERIAL_VALID_MEM_SPACES HBWSPACE HOSTSPACE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjwilke do these changes for valid mem spaces need to propagate to develop independent of this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is only an ETI thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we're double-instantiating all those kernels? sadness :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we were before, too? This PR is a "preserve all existing behavior". The next PR would do the anon spaces stuff.

One thing at a time!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjwilke praise Turing and pass the ammunition ;-)

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 13, 2019

I had to make quite a few changes for blake to various things, independent of the ETI.

@ndellingwood
Copy link
Contributor

I had to make quite a few changes for blake to various things, independent of the ETI.

@jjwilke can you cherry-pick these changes for a separate PR?

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 14, 2019

Cherry-picked changes here: #533

@jjwilke
Copy link
Contributor Author

jjwilke commented Dec 15, 2019

Timing data on kokkos-dev-2, make -j40

  • Old ETI w/ TESTS
    -real 2m38.381s
    -user 42m4.556s
    -sys 8m11.060s
  • Old ETI w/o TESTS
    -real 1m56.462s
    -user 28m27.473s
    -sys 5m34.347s
  • New ETI w/ TESTS
    -real 3m34.466s
    -user 32m35.050s
    -sys 3m58.608s
  • New ETI w/o TESTS
    -real 2m49.847s
    -user 18m18.393s
    -sys 1m18.796s
  • New ETI w/ TESTS, one file per ETI instance
    -real 1m56.433s
    -user 44m59.249s
    -sys 6m22.456s
  • New ETI w/o TESTS one file per ETI instance
    -real 1m20.096s
    -user 31m46.109s
    -sys 3m44.812s

@ndellingwood
Copy link
Contributor

Unsuccessful test on kokkos-dev-2, spot-check with cm_test_all_sandia script

#######################################################
PASSED TESTS
#######################################################
#######################################################
FAILED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release (build failed)
clang-8.0-Pthread_Serial-release (build failed)
cuda-10.1-Cuda_OpenMP-release (build failed)
gcc-7.3.0-OpenMP-release (build failed)
gcc-7.3.0-Pthread-release (build failed)
gcc-9.1-OpenMP-release (build failed)
gcc-9.1-Serial-release (build failed)
intel-18.0.5-OpenMP-release (build failed)

Looks like linking errors with the trsm update:

Testing compiler gcc/7.3.0
  Starting job gcc-7.3.0-OpenMP-release
...
[ 97%] Linking CXX executable KokkosKernels_blas_openmp
CMakeFiles/KokkosKernels_blas_openmp.dir/openmp/Test_OpenMP_Blas3_trsm.cpp.o: In function `void KokkosBlas::trsm<Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::OpenMP>, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::OpenMP> >(char const*, char const*, char const*, char const*, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::OpenMP>::const_value_type&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::OpenMP> const&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::OpenMP> const&)':
Test_OpenMP_Blas3_trsm.cpp:(.text._ZN10KokkosBlas4trsmIN6Kokkos4ViewIPPdJNS1_10LayoutLeftENS1_6OpenMPEEEES7_EEvPKcS9_S9_S9_RNT0_16const_value_typeERKT_RKSA_[_ZN10KokkosBlas4trsmIN6Kokkos4ViewIPPdJNS1_10LayoutLeftENS1_6OpenMPEEEES7_EEvPKcS9_S9_S9_RNT0_16const_value_typeERKT_RKSA_]+0x55a): undefined reference to `KokkosBlas::Impl::TRSM<Kokkos::View<double const**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, false, true>::trsm(char const*, char const*, char const*, char const*, double const&, Kokkos::View<double const**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> > const&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::OpenMP, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> > const&)'
collect2: error: ld returned 1 exit status

....
  Starting job gcc-7.3.0-Pthread-release
...
[ 97%] Linking CXX executable KokkosKernels_blas_threads
CMakeFiles/KokkosKernels_blas_threads.dir/threads/Test_Threads_Blas3_trsm.cpp.o: In function `void KokkosBlas::trsm<Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Threads>, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Threads> >(char const*, char const*, char const*, char const*, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Threads>::const_value_type&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Threads> const&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Threads> const&)':
Test_Threads_Blas3_trsm.cpp:(.text._ZN10KokkosBlas4trsmIN6Kokkos4ViewIPPdJNS1_10LayoutLeftENS1_7ThreadsEEEES7_EEvPKcS9_S9_S9_RNT0_16const_value_typeERKT_RKSA_[_ZN10KokkosBlas4trsmIN6Kokkos4ViewIPPdJNS1_10LayoutLeftENS1_7ThreadsEEEES7_EEvPKcS9_S9_S9_RNT0_16const_value_typeERKT_RKSA_]+0x55a): undefined reference to `KokkosBlas::Impl::TRSM<Kokkos::View<double const**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Threads, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Threads, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> >, false, true>::trsm(char const*, char const*, char const*, char const*, double const&, Kokkos::View<double const**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Threads, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> > const&, Kokkos::View<double**, Kokkos::LayoutLeft, Kokkos::Device<Kokkos::Threads, Kokkos::HostSpace>, Kokkos::MemoryTraits<1u> > const&)'

etc

@jjwilke
Copy link
Contributor Author

jjwilke commented Jan 7, 2020

Yeah, somehow I managed to rebase with no conflicts without actually adding trsm to the new ETI system. Anyhow, ETI files are now generated for trsm, which should clear the error. The revision history got refactored to read a bit more sensibly - so you'll need a clean checkout.

On both develop and the branch, I'm getting gemv errors in the tests, though.

@ndellingwood
Copy link
Contributor

Tests on kokkos-dev-2 look better:

#######################################################
PASSED TESTS
#######################################################
clang-8.0-Pthread_Serial-release build_time=48 run_time=133
cuda-10.1-Cuda_OpenMP-release build_time=151 run_time=123
gcc-7.3.0-OpenMP-release build_time=46 run_time=46
gcc-7.3.0-Pthread-release build_time=42 run_time=67
gcc-9.1-OpenMP-release build_time=51 run_time=46
gcc-9.1-Serial-release build_time=45 run_time=62
intel-18.0.5-OpenMP-release build_time=129 run_time=45
#######################################################
FAILED TESTS
#######################################################
clang-8.0-Cuda_OpenMP-release (build failed)

The failing clang build is due to -Werror triggered in the fallback gemm routine; talked to Christian about this, we're going to disable checking for the particular warning in cm_test_all_sandia, PR #558 has the change to script, which allows this test to pass.

@ndellingwood
Copy link
Contributor

White spot-check testing with cm_test_all_sandia looks good.

gcc tests passed, I had to restart tests after with corrected arch options, below is output for xl and cuda:

${KOKKOSKERNELS_PATH}/scripts/cm_test_all_sandia ibm cuda --kokkos-path=${KOKKOS_PATH} --kokkoskernels-path=${KOKKOSKERNELS_PATH} --spot-check --num=1 --arch=Power8,Pascal60

Running on machine: white
Repository Status:  94ae52daca2f919cb4c9f61afbecaca7eca93a0b completely remove all ETI scripts and raw Makefiles


Going to test compilers:  ibm/16.1.0 cuda/9.2.88
Testing compiler ibm/16.1.0
  Starting job ibm-16.1.0-OpenMP-release
kokkos options:
kokkos devices: OpenMP
kokkos cxx: -O3 -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
  PASSED ibm-16.1.0-OpenMP-release
  Starting job ibm-16.1.0-Serial-release
kokkos options:
kokkos devices: Serial
kokkos cxx: -O3 -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
  PASSED ibm-16.1.0-Serial-release
Testing compiler cuda/9.2.88
  Starting job ibm-16.1.0-OpenMP_Serial-release
kokkos options:
kokkos devices: OpenMP,Serial
kokkos cxx: -O3 -Wall -Wshadow -pedantic -Wsign-compare -Wtype-limits -Wuninitialized
  PASSED ibm-16.1.0-OpenMP_Serial-release
  Starting job cuda-9.2.88-Cuda_OpenMP-release
kokkos options:
kokkos devices: Cuda,OpenMP
kokkos cxx: -O3 -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
  PASSED cuda-9.2.88-Cuda_OpenMP-release

==============================================
Complete.
==============================================
  Starting job cuda-9.2.88-Cuda_Serial-release
kokkos options:
kokkos devices: Cuda,Serial
kokkos cxx: -O3 -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
  PASSED cuda-9.2.88-Cuda_Serial-release

@srajama1
Copy link
Contributor

srajama1 commented Jan 8, 2020

Timing data on kokkos-dev-2, make -j40

* Old ETI w/ TESTS
  -real	2m38.381s
  -user	42m4.556s
  -sys	8m11.060s

* Old ETI w/o TESTS
  -real	1m56.462s
  -user	28m27.473s
  -sys	5m34.347s

* New ETI w/ TESTS
  -real	3m34.466s
  -user	32m35.050s
  -sys	3m58.608s

* New ETI w/o TESTS
  -real	2m49.847s
  -user	18m18.393s
  -sys	1m18.796s

* New ETI w/ TESTS, one file per ETI instance
  -real	1m56.433s
  -user	44m59.249s
  -sys	6m22.456s

* New ETI w/o TESTS one file per ETI instance
  -real	1m20.096s
  -user	31m46.109s
  -sys	3m44.812s

We pushed the one file per ETI instance change, right ? Thanks for keeping track of the times.

@jjwilke
Copy link
Contributor Author

jjwilke commented Jan 8, 2020

Yes, the most recent commit does one ETI instance per file for better parallelism.

@srajama1
Copy link
Contributor

srajama1 commented Jan 8, 2020

@jjwilke Thank you !

@srajama1
Copy link
Contributor

@jjwilke Is this ready to go in to develop ?

@ndellingwood
Copy link
Contributor

@srajama1 we need to convert testing to the cm_* scripts first, or else nearly all our nightlies will begin failing.

@jjwilke
Copy link
Contributor Author

jjwilke commented Jan 31, 2020

I believe this is ready. Obviously we'll want some extensive testing.

@ndellingwood
Copy link
Contributor

@jjwilke the nightlies are now testing with the cmake build system, so that blocker is removed from this PR.
@srajama1 @iyamazaki when this PR is ready for merge the Makefile build system will be removed, so a cmake configure to build the supernodal sptrsv will be needed to exercise the capability.

@jjwilke
Copy link
Contributor Author

jjwilke commented Feb 10, 2020

I would merge the TPL PR #546, then merge this. I this branch has been rebased and should be ready to merge. Someone will want to make sure all the recent SPILUK, SPTRSV changes are still working in this branch.

@jjwilke jjwilke force-pushed the new-eti-system branch 2 times, most recently from 7906fb0 to b0e1e79 Compare March 12, 2020 16:53
@jjwilke
Copy link
Contributor Author

jjwilke commented Mar 12, 2020

@ndellingwood spot-check and spot-check-tpls are all passing on kokkos-dev-2. I think this is ready to go.

Copy link
Contributor

@ndellingwood ndellingwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm based on passing tests and seeming to be safe with last round of integration testing, this is awesome @jjwilke ! There were a lot of files to scroll down, did I miss instructions for how devs should update the build system when adding new capabilities? That can be added later if not in this PR

@@ -0,0 +1,36 @@
#! /usr/bin/env python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't write python code often, wanted to check if this is python 2 vs 3 agnostic?

@ndellingwood
Copy link
Contributor

@jjwilke arg, I dropped another merge conflict on you, sorry...

@ndellingwood
Copy link
Contributor

@jjwilke if you have time to clean up the merge conflict (sorry for another) I think we're good to merge this in?

@jjwilke
Copy link
Contributor Author

jjwilke commented Mar 13, 2020

Rebased. Ready to go?

@ndellingwood ndellingwood merged commit e5b09a0 into kokkos:develop Mar 17, 2020
@ndellingwood
Copy link
Contributor

Merging that was like our version of "the snap"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants