-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KokkosCore_UnitTest_CudaInterOpStreams_MPI_1 failing in ATDM Trilinos builds starting before 2020-07-08 #8544
Comments
Test results for issue #8544 as of 2021-01-17
Tests with issue trackers Passed: twip=6 Detailed test results: (click to expand)Tests with issue trackers Passed: twip=6
Tests with issue trackers Failed: twif=4
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat. |
This test as well as the one mentioned in #8543 test interoperability with raw CUDA. In particular they test situations where CUDA is already used before Kokkos initialize and/or after Kokkos finalize. As such switching the GPU ID during Kokkos initialize will lead to the observed errors. One should NOT use any mechanism to tell Kokkos to choose a specific GPU. CUDA_VISIBLE_DEVICES probably works. In practice telling Kokkos to use device id 0 will also work (just not sure that CUDA guarantees that that is the default GPU). |
@crtrott, that was not the appraoch/agreement we came to as part of:
Perhaps Kokkos needs to be updated to read in these CTest env vars earlier? Changing to use |
With one should NOT use that mechanism: I mean specifically for those two tests. As I said I would recommend either disabling these two tests, or mark them as not runnable in parallel with other tests (is that a thing you can do?). |
Yes and yes. For the former: and for the latter: As shown here, this test finished in less than 3s so I think we just need to add:
for each of these tests to: right about here: |
Need feedback from CDash before closing |
Test results for issue #8544 as of 2021-01-24
Tests with issue trackers Passed: twip=4 Detailed test results: (click to expand)Tests with issue trackers Passed: twip=4
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat. |
Test results for issue #8544 as of 2021-01-31
Tests with issue trackers Passed: twip=10 Detailed test results: (click to expand)Tests with issue trackers Passed: twip=10
This is an automated comment generated by Grover. Each week, Grover collates and reports data from CDash in an automated way to make it easier for developers to stay on top of their issues. Grover saw that there are tests being tracked on CDash that are associated with this open issue. If you have a question, please reach out to Ross. I'm just a cat. |
Closing as this has been passing since 01-23-2021 as shown in this query. |
CC: @trilinos/kokkos, @crtrott (Trilinos Data Services Product Lead), @bartlettroscoe
Next Action Status
Description
As shown in this query (click "Shown Matching Output" in upper right) the tests:
KokkosCore_UnitTest_CudaInterOpStreams_MPI_1
in the builds:
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_complex_static_opt_cuda-aware-mpi
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_dbg
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_dbg_cuda-aware-mpi
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_opt
Trilinos-atdm-ats2-cuda-10.1.243-gnu-7.3.1-spmpi-rolling_static_opt_cuda-aware-mpi
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-debug
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-rdc-release-debug
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release
Trilinos-atdm-white-ride-cuda-9.2-gnu-7.2.0-release-debug
started failing on testing day 2020-07-08.
All of the tests in debug builds show the following output like shown here:
Current Status on CDash
Run the above query adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day.
Steps to Reproduce
One should be able to reproduce this failure as described in:
and the system-specific instructions at:
Just log into any of the associated machines and copy and paste the full CDash build name
<build-name>
listed above and run commands like:where
<package-name>
is any package that you want to enable to reproduce build and/or test results.Again, for exact system-specific details on what commands to run to build and run tests, see:
If you can't figure out what commands to run to reproduce the problem given this documentation, then please post a comment here and we will give you the exact minimal commands
The text was updated successfully, but these errors were encountered: