Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: improve performance of TAFC (Transfer and fillComplete) #11689

Closed
6 tasks done
jhux2 opened this issue Mar 17, 2023 · 6 comments
Closed
6 tasks done

Tpetra: improve performance of TAFC (Transfer and fillComplete) #11689

jhux2 opened this issue Mar 17, 2023 · 6 comments
Assignees
Labels
pkg: Tpetra type: enhancement Issue is an enhancement, not a bug

Comments

@jhux2
Copy link
Member

jhux2 commented Mar 17, 2023

Enhancement

@trilinos/tpetra

TAFC currently is a host-based operation. This makes it potentially very expensive, for example, in a matrix product such as $R*(AP)$ in multigrid setup. Within Tpetra::CrsMatrix::transferAndFillComplete(), I've measured the tall poles to be

@jhux2
Copy link
Member Author

jhux2 commented Mar 17, 2023

Sample timer data:

|   |   |   |   |   |   |   TpetraExt MueLu::R*(AP)-implicit-1: MMM-T exportAndFillComplete: 0.451469 - 85.9368% [1] {min=0.450672, max=0.452267, std dev=0.00112805} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC All:MMLegacy: 0.451126 - 99.9239% [1] {min=0.450328, max=0.451924, std dev=0.00112886} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC getOwningPIDs same map:               0.00026664 -  0.0591054% [1] {min=0.000259792, max=0.000273488, std dev=9.68453e-06} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC reallocate buffers:                   7.8373e-05 -  0.0173727% [1] {min=7.1575e-05, max=8.5171e-05, std dev=9.61382e-06} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC pack and prepare:                     0.00181816 -  0.403027% [1] {min=0.00172944, max=0.00190688, std dev=0.000125466} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC getOwningPIDs exchange remote data:   0.00127957 -  0.28364% [1] {min=0.00117301, max=0.00138613, std dev=0.000150701} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC unpack-count-resize:                  0.0733156  - 16.2517% [1] {min=0.0728005, max=0.0738308, std dev=0.000728582} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC copy same-perm-remote data:           0.109153   - 24.1956% [1] {min=0.109, max=0.109305, std dev=0.000215957} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC makeColMap:                           0.079964   - 17.7254% [1] {min=0.079285, max=0.080643, std dev=0.000960279} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC restrict colmap:                      2.2995e-06 -  0.000509724% [1] {min=2.074e-06, max=2.525e-06, std dev=3.18905e-07} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC sortAndMergeCrsEntries:               0.148774   - 32.9783% [1] {min=0.147311, max=0.150236, std dev=0.00206834} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC setAllValues:                         0.0200524  -  4.44497% [1] {min=0.0198734, max=0.0202314, std dev=0.000253164} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>
|   |   |   |   |   |   |   |   |   Tpetra MueLu::R*(AP)-implicit-1: TAFC build importer and esfc:              0.00323561 -  0.717229% [1] {min=0.00159667, max=0.00487454, std dev=0.0023178} <1, 0, 0, 0, 0, 0, 0, 0, 0, 1>

@jhux2
Copy link
Member Author

jhux2 commented Mar 17, 2023

@lucbv

@cgcgcg
Copy link
Contributor

cgcgcg commented Mar 17, 2023

Why is it running the Legacy code path? There is an MMOpt, if I remember correctly.

@jhux2
Copy link
Member Author

jhux2 commented Mar 17, 2023

My understanding is that MMOpt comes into play after the various phases above.

See

bool isMM = false; // optimize for matrix-matrix ops.

if(isMM) os<<":MMOpt";
else os<<":MMLegacy";

@cgcgcg
Copy link
Contributor

cgcgcg commented Mar 17, 2023

Looking at the code, that seems to be correct.

@jhux2 jhux2 self-assigned this Mar 17, 2023
lucbv added a commit to lucbv/Trilinos that referenced this issue Sep 14, 2023
After the refactor done in PR trilinos#11751, these new
changes remove the copy to host after sorting the
CRS matrix and instead just call the setAllValues
overload that accepts Kokkos Views.
Should improve performance a bit and this provides
a step toward completing issue trilinos#11689.
cwpearson pushed a commit to cwpearson/Trilinos that referenced this issue Sep 19, 2023
After the refactor done in PR trilinos#11751, these new
changes remove the copy to host after sorting the
CRS matrix and instead just call the setAllValues
overload that accepts Kokkos Views.
Should improve performance a bit and this provides
a step toward completing issue trilinos#11689.
@jhux2
Copy link
Member Author

jhux2 commented Oct 20, 2023

closing as complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg: Tpetra type: enhancement Issue is an enhancement, not a bug
Projects
Status: Done
Development

No branches or pull requests

2 participants