-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Promotion 2.7.24 broke MV unit tests in Tpetra with complex types #360
Comments
This can be reproduced on White (Kepler nodes) in Trilinos using the following:
Load modules
Configure
|
I suggest making the changes in Trilinos' kokkos-kernels and testing there first; trying to link to VOTD kokkos and kokkos-kernels will break unless using the kokkos-promotion branch of Trilinos. |
@kyungjoo-kim do you have time to look at this? It seems related to the difference between PR #314 you worked on and the trilinos/Trilinos#3538 PR we clobbered with the promotion. |
@kyungjoo-kim it could also be related to some of the |
@lucbv If you don't mind, could you check if this test still fails after randomizing only real part ? You can do so by changing the
In this way, the complex values have zero imagineries. If the double test passes, the complex test with zero imagineris also should passes. In my case, it fails. I found this problem in KokkosKernels and I am wondering if this is also the case. If the test still fails while double test passes, then we need to fix Kokkos complex. |
@kddevin @trilinos/tpetra @trilinos/framework @jwillenbring @rrdrake @theguruat12 My work-around for this got clobbered. BLAS functions that return complex have been known to be broken for a long time, due to ABI issues. See e.g., |
@mhoemmen I'll get a build going on White and also test with @kyungjoo-kim 's suggestion in the comment above - if we test |
Thanks @ndellingwood ! |
Build in progress, may take a little bit... |
Extra failure info from Tpetra MV unit test for reference- the issue seems to occur in kokkos-kernels so it is pasted here rather than the Trilinos issue. Same setup as reproducer steps above on White. Allocate node: Run MV tests: Failures:
Specific output grouped by test above:
|
The problem occurs with the Serial tests (not the CudaUVM tests). In the case when Cuda is the execution space, the kokkos-kernels specialization is called and everything works fine. In the case when Serial is the execution space, the BLAS tpl specialization is hit, ultimately making a call to
I checked the values of the @mhoemmen 's previous PR trilinos/Trilinos#3538 makes sure we use the kokkos-kernels implementation (rather than the BLAS tpl specialization) and would serve as a solution. @kyungjoo-kim @mhoemmen @crtrott @srajama1 any suggestions where to go from here? The fastest path is to reimplement Mark's fix clobbered by the promotion, which would disable TPL options when complex types are used. If that isn't the desired solution I'll have to sit down with someone to get a clear understanding how to handle this. |
Also pinging @lucbv so he is in the loop, he has been waiting on a fix for this. |
@ndellingwood Just to be clear, it sounds like If that's correct, then my guess is that this is caused by a different Fortran ABI for returning a complex number, than what kokkos-kernels is expecting. kokkos-kernels expects Trilinos has seen issues like this before. Teuchos' CMake logic includes tests for this case:
and Teuchos actually uses this to decide which ABI variant to use. To fix this, we would just copy those tests over to kokkos-kernels (since kokkos-kernels is not supposed to depend on Teuchos). We can use the latter |
@theguruat12 @rrdrake If I'm right, this should be pretty straightforward to fix. We can apply the fix to the kokkos-kernels and Trilinos repos simultaneously, so that we can go ahead with the Trilinos promotion. |
This issue opened to address @lucbv's issue trilinos/Trilinos#3823.
Previously, @mhoemmen committed PR trilinos/Trilinos#3538 to resolve this issue in Tpetra.
The recent promotion clobbered that commit and replaced the fix with those in PR #314. However, problems with complex types in Tpetra's MV are occurring again, as reported by @lucbv.
The text was updated successfully, but these errors were encountered: