Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ifpack2 sptrscv option caused diffs in Nalu and Nalu-Wind #7847

Closed
spdomin opened this issue Aug 16, 2020 · 17 comments
Closed

Ifpack2 sptrscv option caused diffs in Nalu and Nalu-Wind #7847

spdomin opened this issue Aug 16, 2020 · 17 comments
Assignees
Labels
pkg: Ifpack2 type: bug The primary issue is a bug in Trilinos code or tests

Comments

@spdomin
Copy link
Contributor

spdomin commented Aug 16, 2020

Bug Report

@trilinos/Ifpack2

Description

New diff in Nalu and Nalu-Wind:

Good:
NaluCFD/Nalu SHA1: 43989dd9b6b2ad979d83fc8b725a6840ae02c2bf
Trilinos/develop SHA1: a3a548a

Bad:
NaluCFD/Nalu SHA1: 43989dd9b6b2ad979d83fc8b725a6840ae02c2bf
Trilinos/develop SHA1: 855be51

Sadly, my cronjob failed and I missed a day. Moreover, @crtrott, I had to skip a bunch of commits as Trilinos seemed to be in a bad build state (see below).

In this case, the first test runs well while the second (restarted) case fails. The only difference between the first run and the restarted run is that the diffing restart cases activates:
preconditioner: riluk

This is supported by the bisect commit:

95261df92e19bfdfa366744fec4fdf0e29c03711 is the first bad commit
commit 95261df92e19bfdfa366744fec4fdf0e29c03711
Author: Nathan Ellingwood <ndellin@sandia.gov>
Date:   Wed Jul 15 12:42:52 2020 -0600

    Ifpack2: Add kokkos-kernels sptrsv option as local tri solver

:040000 040000 4b41e88465d5531c0f1f2b9dacaacd198a48232a 46b080e32afcbd41781d6f66bc98895d9e6a9027 M	packages

@ndellingwood, it looks like this commit caused some diffs in our Nalu-based application codes. If you think that your commit above was order-of-ops or a change in default, let us know how to replicate the old behavior. If you know that this is a great new default option, I can accept the diffs.


In file included from /home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/build_bisect/packages/ifpack2/src/Ifpack2_RILUK_Serial.cpp:50:
/home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/packages/ifpack2/src/Ifpack2_RILUK_def.hpp: In member function ‘void Ifpack2::RILUK<MatrixType>::comput
e()’:
/home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:912:16: error: ‘sort’ is not a member of ‘Kokkos’
        Kokkos::sort(subview(L_entries, Kokkos::make_pair(row_start, row_end)));
                ^~~~
/home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:912:16: note: suggested alternative: ‘sqrt’
        Kokkos::sort(subview(L_entries, Kokkos::make_pair(row_start, row_end)));
                ^~~~
                sqrt
/home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:917:16: error: ‘sort’ is not a member of ‘Kokkos’
        Kokkos::sort(subview(U_entries, Kokkos::make_pair(row_start, row_end)));
                ^~~~
/home/spdomin/gitHubWork/nightlyBuildAndTest/Trilinos/packages/ifpack2/src/Ifpack2_RILUK_def.hpp:917:16: note: suggested alternative: ‘sqrt’
        Kokkos::sort(subview(U_entries, Kokkos::make_pair(row_start, row_end)));
                ^~~~

@jhux2 and @alanw0, look like nalu-wind also suffered a new diff as well:

https://my.cdash.org/test/7152004

Steps to Reproduce

  1. Build Nalu or Nalu-Wind and run dgNonConformalThreeBlade
@spdomin spdomin added the type: bug The primary issue is a bug in Trilinos code or tests label Aug 16, 2020
@ndellingwood
Copy link
Contributor

Hi @spdomin ,

@vqd8a started looking into this, it seems like there are a couple separate issues here:

  1. The diffs - @vqd8a this can be addressed hopefully pretty easily by resetting the default iluk implementation to Serial (in packages/ifpack2/src/Ifpack2_RILUK_def.hpp default isKokkosKernelsSpiluk_ to false at lines 88, 111)

  2. The compilation error - @spdomin Vinh looked into this issue, he believes that for some reason in that build the Ifpack2_RILUK_def.hpp must be obsolete - did that error show up during bisecting? If so, this is likely due to the good/bad commit hopping resulting in a checkout that had the issue, but was fixed by a later commit within the PR.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 16, 2020

Thanks for the reply. Yes, I noticed the build errors were resolved in a later commit. They were simply causing issues for the bisect as I had to skip a few.

Can we expose the isKokkosKernelsSpiluk option to the input File? Also, if this new default is preferred, then a simple re-bless might be in order?

@vqd8a
Copy link
Contributor

vqd8a commented Aug 17, 2020

Hi @spdomin,

I have put a PR #7848 for this issue. Now Kokkos Kernels SPILUK can be set as an input parameter. Please test it again when the PR gets merged. Thanks.

@vbrunini
Copy link
Contributor

Can we expose the isKokkosKernelsSpiluk option to the input File? Also, if this new default is preferred, then a simple re-bless might be in order?

Yeah, if the kokkos-kernels implementation is the version that is expected to be faster and performance portable I would strongly prefer it be the default. Having to find the magic set of parameter list options to get the fast performance portable implementation of the preconditioner is a terrible user experience.

@jhux2
Copy link
Member

jhux2 commented Aug 17, 2020

@trilinos/ifpack2

@spdomin
Copy link
Contributor Author

spdomin commented Aug 17, 2020

I agree with @vbrunini - we simply need to understand which is the preferred version for our production runs to avoid users dialing in different versions of RILUK.

However, now that the flag is exposed, the application can simply create the correct one (subject to the advise from the T-team). Does the below fix look correct?

  else if (precond_ == "riluk" ) {
    preconditionerType_ = "RILUK";
  } 
  else if ( precond_ == "kspiluk" ) {
   preconditionerType_ = "KSPILUK";
  }

@ndellingwood
Copy link
Contributor

@vbrunini @spdomin this will be the first exposure of these codes outside of localized testing for application use, it may be safer to keep the existing default (Serial) for stability in production runs while leaving KSPILUK and KSPTRSV as parameter options until tested in application settings, is this a reasonable approach?

@vqd8a
Copy link
Contributor

vqd8a commented Aug 17, 2020

@spdomin @vbrunini To choose the KSPILUK as the ILUK implementation and KSPTRSV as the local trisolver, in the application code, a user can set:

params.set("fact: type", "KSPILUK");
params.set("trisolver: type", "KSPTRSV");

Do these work for you?

@spdomin
Copy link
Contributor Author

spdomin commented Aug 17, 2020

Sounds good. Let’s proceed with keeping the previous default, which the PR again allows, and then, after production testing, we can transition. Once the PR is through and we have a clean nightly, I can add the following to Nalu:

  else if (precond_ == "riluk" ) {
    preconditionerType_ = "RILUK";
  }
  else if (precond_ == "kspiluk" ) {
    preconditionerType_ = "RILUK";
    paramsPrecond_->set("fact: type","KSPILUK");
    paramsPrecond_->set("trisolver: type","KSPTRSC");
  }  

@spdomin
Copy link
Contributor Author

spdomin commented Aug 18, 2020

Looks like the PR failed due to time outs?

The nalu test is still diffing.

@ndellingwood
Copy link
Contributor

Looks like the PR failed due to time outs?

@spdomin I think this has been a problem for several PRs, it looks like #7850 just merged and should address the timeouts so one more run of the PR tester should hopefully get this in.

@ndellingwood ndellingwood added the AT: RETEST Causes the PR autotester to run a new round of PR tests on the next iteration label Aug 18, 2020
@ndellingwood
Copy link
Contributor

ndellingwood commented Aug 18, 2020

Added the RETEST label

Edit: Pasted in wrong location

@ndellingwood ndellingwood removed the AT: RETEST Causes the PR autotester to run a new round of PR tests on the next iteration label Aug 18, 2020
@spdomin
Copy link
Contributor Author

spdomin commented Aug 18, 2020

Post at this page when the PR is successful and I can fire off a test.

@ndellingwood
Copy link
Contributor

Post at this page when the PR is successful and I can fire off a test.

Will do, thanks!

@vqd8a
Copy link
Contributor

vqd8a commented Aug 18, 2020

@spdomin, #7848 was merged. Please test it and let us know. Thanks.

@spdomin
Copy link
Contributor Author

spdomin commented Aug 19, 2020

I see that it was merged and the test suite looks clean now. I can add the code later this week. Feel free to close - I will reference this ticket when I proceed with my Nalu push. Then, you can use this app as part of your testing?

@vqd8a
Copy link
Contributor

vqd8a commented Aug 19, 2020

@spdomin I think we can do it.

@spdomin spdomin closed this as completed Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Ifpack2 type: bug The primary issue is a bug in Trilinos code or tests
Projects
None yet
Development

No branches or pull requests

5 participants