Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda uvm test failures without launch blocking - expected behavior? #636

Closed
ndellingwood opened this issue Mar 4, 2020 · 4 comments
Closed

Comments

@ndellingwood
Copy link
Contributor

ndellingwood commented Mar 4, 2020

The following tests are failing in cuda builds with UVM enabled and without setting

    export CUDA_LAUNCH_BLOCKING=1
    export CUDA_MANAGED_FORCE_DEVICE_ALLOC=1

Failing tests

2: [ RUN      ] cuda.sparse_block_gauss_seidel_rank2_double_int_int_TestExecSpace
2/8 Test #2: sparse_cuda ......................***Exception: Bus error 12.55 sec
4: [ RUN      ] cuda.common_serial_radix
4/8 Test #4: common_cuda ......................***Exception: Bus error  7.55 sec

I observed this on white while adding a cuda+UVM build to the spot-check tests.

Reproducer instructions for white/ride (rhel7F kepler queue):

kokkos SHA: c699b69827d25d43d3c6f43b7f1990c980c88b46
kokkos-kernels SHA: 41fe4f2

module load cmake/3.12.3 cuda/9.2.88 gcc/7.2.0 ibm/xl/16.1.0

<KOKKOSKERNELS_PATH>/cm_generate_makefile.bash --with-devices=Cuda,Serial --arch=Power8,Kepler37 --compiler=/ascldap/users/ndellin/kokkos/bin/nvcc_wrapper --cxxflags="-O3 -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized " --cxxstandard="11" --ldflags="" --with-cuda=/home/projects/ppc64le-pwr8-nvidia/cuda/9.2.88 --kokkos-path=<KOKKOS_PATH> --kokkoskernels-path=<KOKKOSKERNELS_PATH> --with-scalars='' --with-ordinals= --with-offsets= --with-layouts= --with-tpls= --no-examples --with-options=enable_large_mem_tests

@brian-kelley I marked this as "question" in case these tests require that CUDA_LAUNCH_BLOCKING be enabled. If these tests shouldn't require launch blocking then that indicates a missing fence somewhere and we can relabel as a bug.

The existing cuda+uvm nightly build enables CUDA_LAUNCH_BLOCKING, which allowed this to pass through undetected.

Edit: Added SHAs tested

@brian-kelley
Copy link
Contributor

brian-kelley commented Mar 4, 2020

@ndellingwood No, It's definitely a bug (I never intended CUDA_LAUNCH_BLOCKING to be a requirement). On the bright side, I'm almost positive it's in the block Gauss-Seidel test itself, and not the library, because

  • only one place I know of uses mirror views in the library, and that's the color set table "h_color_xadj"
  • it gets deep copied to host, and then there's a fence
  • the device side of the mirror view never gets touched again

Sorting also shouldn't require CUDA_LAUNCH_BLOCKING and doesn't use mirror views, but it could be missing fences in the user-facing functions. The problem also might be missing fences in the test, where it creates random inputs and verifies the output on host.

I'll try to replicate and fix these, and include them with my fix for #634.

@brian-kelley
Copy link
Contributor

Actually, can I just merge the #634 fix now? One test failed on kokkos-dev (blas_cuda) due to misaligned complex in dot. It's obviously not affected by a change in distance-2 coloring.

@ndellingwood
Copy link
Contributor Author

Actually, can I just merge the #634 fix now

Which PR is it? I can approve the PR

@brian-kelley
Copy link
Contributor

@ndellingwood #638

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants