Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jmm/one buffer #494

Merged
merged 6 commits into from
Apr 27, 2021
Merged

Jmm/one buffer #494

merged 6 commits into from
Apr 27, 2021

Conversation

Yurlungur
Copy link
Collaborator

@Yurlungur Yurlungur commented Apr 21, 2021

PR Summary

Built on the discussions in the hackathon and on @pgrete 's unification of communication buffers in #493 . This combines the flux and coarse cell buffers into a single kokkos view. A couple of things to note:

  • This reduces the number of allocations per variable per meshblock by 3.
  • This relies on LayoutRight in our kokkos views.
  • This is wasteful of some memory because the coarse buffer is now the same size as the dense buffer.
  • This is built on Use subviews for buffers #493 and should go in after it.

I haven't done any profiling yet.

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • (@lanl.gov employees) Update copyright on changed files

Copy link
Collaborator

@forrestglines forrestglines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to see the fix to the unit test, but once that's in we should merge

Comment on lines +120 to +123
// This wastes about 1/2 a meshblock in memory
coarse_s = ParArrayND<T>(Kokkos::subview(comm_data_, offset++, Kokkos::ALL(),
Kokkos::ALL(), Kokkos::ALL(), Kokkos::ALL(),
Kokkos::ALL(), Kokkos::ALL()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting around the wasted memory would probably mean allocating from fluxes and this coarse data from one long array, right? However, I don't think Kokkos supports changing layouts like that with a subview.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried that actually. Kokkos yelled at me because types.

}
n_outer += (pmb->pmy_mesh->multilevel);
comm_data_ = ParArray7D<T>(base_name + ".comm_data", n_outer, GetDim(6), GetDim(5),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is comm_data_ only the collection of 4 arrays - the 3 fluxes and the coarse array? Does the benefit of combining these allocations outweight the cost of wasted memory for coarse_s? Could coarse_s remain it's own array?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less relevant for this PR, but is allocating space for all the fluxes necessary? Could the flux divergence be computed in each dimension one by one so that only one flux array is needed in memory? If possible, this would allow larger problem sizes but might be too big a code change.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is comm_data_ only the collection of 4 arrays - the 3 fluxes and the coarse array? Does the benefit of combining these allocations outweight the cost of wasted memory for coarse_s? Could coarse_s remain it's own array?

That could be done. I'm not sure what the right trade-off is. I would say we accept the memory overhead for now, and we can split out the coarse buffer later if we desire.

This is less relevant for this PR, but is allocating space for all the fluxes necessary? Could the flux divergence be computed in each dimension one by one so that only one flux array is needed in memory? If possible, this would allow larger problem sizes but might be too big a code change.

This is an interesting idea. I agree not for this PR, but something to think about.

@pgrete pgrete merged commit 0607afc into hackathon_integration Apr 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants