Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tpetra: Removed lambdas from CrsMatrix::getLocalDiagCopy #3

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Tpetra: Removed lambdas from CrsMatrix::getLocalDiagCopy #3

wants to merge 1 commit into from

Conversation

amklinv
Copy link

@amklinv amklinv commented Mar 9, 2016

I replaced the lambdas with functors for all three functions so
that the code could be run on device rather than host.

I replaced the lambdas with functors for all three functions so
that the code could be run on device rather than host.
@mhoemmen
Copy link
Collaborator

Yeah, that won't work with CUDA, alas. The functor stores things (e.g., the graph) by reference. References are pointers, and pointers point to host data (the "struct" CrsGraph which lives on the host, even though its data might live on device). Also, CrsGraph methods are not CUDA device methods (not marked with KOKKOS_INLINE_FUNCTION).

@mhoemmen
Copy link
Collaborator

I suspect this is more complicated that we had thought. Best thing might be to port Tpetra::CrsGraph::findLocalIndex to KokkosSparse::CrsMatrix (or even better, Kokkos::StaticCrsGraph, where it really belongs). The advantage of this is that we could start to address trilinos#118 by changing Tpetra to dispatch (like Epetra) between data structures at the top level, rather than having general functions for searching and inserting.

@mhoemmen
Copy link
Collaborator

Please refer to trilinos#205.

@mhoemmen
Copy link
Collaborator

Oh wait, you're working on one-argument getLocalDiagCopy. Ifpack2::Relaxation relies on two-argument getLocalDiagCopy for best performance. It should only call one-argument getLocalDiagCopy in a release build, if the matrix is neither CrsMatrix nor BlockCrsMatrix (grep for "slow path" comment).

If the one-argument version is being called, that's a bug and we should fix it. I'm pretty sure it's not, though, because this optimization has been around for a few years and apps noticed it.

Anyway, two-argument getLocalDiagCopy that takes offsets as a Kokkos::View is a lot easier. You'll need two code paths, depending on whether the matrix is fillComplete. If it's not, just use the existing code path, with its host execution space parallelization. If it is fillComplete, write a device functor that gets the KokkosSparse::CrsMatrix out and works on that. Remember that offsets are relative to each row, not absolute over the whole matrix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants