-
Notifications
You must be signed in to change notification settings - Fork 576
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stokhos: Test Stokhos_KokkosViewUQPCEUnitTest_Serial_MPI_1 randomly failing in 'ats2' CUDA PR build on 'vortex' #11117
Comments
FYI: This failure took out my last PR build iteration #11099 (comment) (see #11099 (comment)). |
So far I have not been able to reproduce this, either on the ATS2 platform or on a regular Linux platform (note the failing test is running with the Serial execution space, so whatever is going on isn't related to CUDA). I've also tried running the test under valgrind and with the clang address sanitizer. Both came up empty. |
@etphipp, it was reported at the TUG today that Sacado might have some undefined memory issues. Does this use DFAD or the reverse AD types? |
Yes. It is issue #7741. I never saw it because the team mention was invalid (which is probably a frighteningly common mistake due to the extra characters in the suggested team mention in the Trilinos issue template). I'm working on it now and believe I might have it fixed. It is due to the horribly designed memory management in RAD. |
I may be mistaken, but I believe that users who are not in the Trilinos Github group cannot tag individual Trilinos teams. This is why with a lot of recent issues you will see @cgcgcg working hard to tag the correct Trilinos teams as soon as they're opened. |
That is correct. That is a long-known flaw in the Trilinos Issue tracking processes. |
Looking at the above query, the last failure was 10/5 and I was never able to reproduce it. So I am going to close this for now. If it fails again, please reopen it. |
CC: @trilinos/stokhos
Next Action Status
Description
As shown in this query (click "Shown Matching Output" in upper right) the tests:
Stokhos_KokkosViewUQPCEUnitTest_Serial_MPI_1
in the unique GenConfig builds:
ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables
started failing on testing day 2022-05-01.
The specific set of CDash builds impacted where:
PR-10472-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-911
PR-10472-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-915
PR-10571-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-1113
PR-11086-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-1182
PR-11099-test-ats2_cuda-10.1.243-gnu-8.3.1-spmpi-rolling_release_static_Volta70_Power9_no-asan_no-complex_no-fpic_mpi_pt_no-rdc_uvm_deprecated-on_no-package-enables-1211
When the test fails, it produces error output like shown here showing:
Current Status on CDash
Run the above query adjusting the "Begin" and "End" dates to match today any other date range or just click "CURRENT" in the top bar to see results for the current testing day.
Steps to Reproduce
Follow instructions at:
or see:
for specific instructions on how to build and run on 'vortex'.
The text was updated successfully, but these errors were encountered: