-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restart bugfix #311
Restart bugfix #311
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. I don't love the function name GetNbs
as nb*
is a cryptic name. But since that's what they're called internally, I think it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting errors when running the tests, which tests is this supposed to fix?
27/36 Test #27: regression_test:advection_performance .............................***Failed 336.32 sec
Start 28: regression_mpi_test:advection_performance
28/36 Test #28: regression_mpi_test:advection_performance .........................***Failed 412.27 sec
Start 29: regression_test:restart
29/36 Test #29: regression_test:restart ........................................... Passed 69.50 sec
Start 30: regression_mpi_test:restart
30/36 Test #30: regression_mpi_test:restart .......................................***Failed 1.38 sec
src/mesh/mesh_refinement.hpp
Outdated
return std::vector<int>( | ||
{refine_flag_, neighbor_rflag_, deref_count_, deref_threshold_}); | ||
} | ||
void SetInternals(std::vector<int> a) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a should be const reference.
src/outputs/restart.hpp
Outdated
@@ -116,6 +116,7 @@ class RestartReader { | |||
std::vector<T> ReadDataset(const char *name, size_t *count = nullptr) { | |||
// Returns entire 1D array. | |||
// status, never checked. We should... | |||
std::vector<T> data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Oh nevermind, I guess this didn't actually really change anything...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ifdef
behavior actually what we want? (and/or is this a leftover from Athena++?)
I'm asking as having the code fail seems like a preferable solution rather than returning an empty object (due to lack of implementation or similar).
It looks like the test will run but something is still throwing an error:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than my optional improvements this is good. The error I was running into is located in the python regression testing framework so this has my green light.
@nmsriram is this ready for merge? |
src/mesh/mesh_refinement.hpp
Outdated
@@ -76,6 +76,16 @@ class MeshRefinement { | |||
// and/or in BoundaryValues::ProlongateBoundaries() (for SMR and AMR) | |||
int AddToRefinement(ParArrayND<Real> pvar_cc, ParArrayND<Real> pcoarse_cc); | |||
int AddToRefinement(FaceField *pvar_fc, FaceField *pcoarse_fc); | |||
std::vector<int> GetInternals() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that int32_t
is used above to read, I suggest to also be more specific here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
src/outputs/restart.hpp
Outdated
@@ -116,6 +116,7 @@ class RestartReader { | |||
std::vector<T> ReadDataset(const char *name, size_t *count = nullptr) { | |||
// Returns entire 1D array. | |||
// status, never checked. We should... | |||
std::vector<T> data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this ifdef
behavior actually what we want? (and/or is this a leftover from Athena++?)
I'm asking as having the code fail seems like a preferable solution rather than returning an empty object (due to lack of implementation or similar).
Codecov Report
@@ Coverage Diff @@
## develop #311 +/- ##
========================================
Coverage 55.82% 55.82%
========================================
Files 107 107
Lines 11103 11128 +25
========================================
+ Hits 6198 6212 +14
- Misses 4905 4916 +11
Continue to review full report at Codecov.
|
@pgrete the reason to return an empty array is that we may change the physics on restart and I didn't want the request for those physics to kill the code. I could go either way on this, perhaps with a query function. |
@nmsriram there are a couple of comments left. It'd be great if they could be addressed (or replied to) before we go ahead and merge this. Also (unrelated to this PR): I just noticed that quite a bit of the restart functionality has been implemented using macros. |
N blocks instead of reading at offset for that particular rank. Added capability to restart with or without ghost zones.
…GPU data is initialized on restart.
2f76ee2
to
0e20618
Compare
@pgrete the reason they are macros is because I couldn't figure out how to do it different. About a week ago @AndrewGaspar commented essentially the same thing and showed a std::function way. I'm a bit hesitant to change this code at this point and will do it in a subsequent PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work. this one was a doozy.
@@ -154,7 +154,7 @@ void FillVarView(const vpack_types::VarList<T> &vars, PackIndexMap *vmap, | |||
auto host_sp = Kokkos::create_mirror_view(Kokkos::HostSpace(), sparse_assoc); | |||
|
|||
int vindex = 0; | |||
int sparse_start; | |||
int sparse_start = -22000; // Initialize to an obviously wrong number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should perhaps formalize this throughout the code and make a habit of initializing to signal NaNs. That's for another time, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we create an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes good idea
@@ -38,6 +38,52 @@ | |||
|
|||
namespace parthenon { | |||
|
|||
// TODO(Yurlunger) Identify an idiomatic way of applying reflective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// TODO(Yurlunger) Identify an idiomatic way of applying reflective | |
// TODO(JMM) Identify an idiomatic way of applying reflective |
@@ -192,7 +192,9 @@ ParthenonManager::ProcessPackagesDefault(std::unique_ptr<ParameterInput> &pin) { | |||
void ParthenonManager::RestartPackages(Mesh &rm, RestartReader &resfile) { | |||
// Restart packages with information for blocks in ids from the restart file | |||
// Assumption: blocks are contiguous in restart file, may have to revisit this. | |||
const IndexDomain interior = IndexDomain::interior; | |||
// const IndexDomain interior = IndexDomain::interior; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// const IndexDomain interior = IndexDomain::interior; |
@@ -129,6 +175,7 @@ void MeshRefinement::RestrictCellCenteredValues(const ParArrayND<Real> &fine, | |||
(fine(n, 0, j, i + 1) * vol01 + fine(n, 0, j + 1, i + 1) * vol11)) / | |||
tvol; | |||
}); | |||
applyBounds(pmb, coarse, cib, cjb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only have a function like this for the 2d case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. I think this should be for all 3 cases, @nmsriram
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh brother. yes. of course. @JoshuaSBrown / @Yurlungur Should I put in another PR to fix this, or should I just depend on @Yurlungur to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoever, gets to it first I suppose.
const IndexDomain theDomain = | ||
(output_params.include_ghost_zones ? IndexDomain::entire : IndexDomain::interior); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
auto nx1 = out_ib.e - out_ib.s + 1; // SS mb.block_size.nx1; | ||
auto nx2 = out_jb.e - out_jb.s + 1; // SS mb.block_size.nx2; | ||
auto nx3 = out_kb.e - out_kb.s + 1; // SS mb.block_size.nx3; | ||
int bsize[3] = {mb.block_size.nx1, mb.block_size.nx2, mb.block_size.nx3}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These comments are confusing to me, what does the SS mean, this kind of looks like you are saying nx1 is the same as mb.block_size.nx1. If so why are you using mb.block_size.nx1 to define bsize and not the more concise nx1 nx2 and nx3 variables?
Ok, I see, nx1 etc will change depending on theDomain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, SS are my initials and I use // SS or //SS where I think I may have screwed up and may need to go back.
@@ -180,6 +179,9 @@ class RestartReader { | |||
// perhaps belongs in a destructor? | |||
void Close(); | |||
|
|||
// Does file have ghost cells? | |||
int hasGhost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why int instead of bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is read in from the HDF file and this requires fewer lines of code
Congrats! It works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why my approval didn't stick. I approve.
PR Summary
This merge fixes a parallel restart bug in the restart path. Parallel restarts with same (or different) number of processors now work for the advection example.
Also fixes a restart bug with reflective boundary conditions with an ugly hack in
mesh/mesh_refinement.cpp
to apply reflective boundary conditions. It is expected that @Yurlungur or @jonahm-LANL will fix this when doing the boundary condition refactor. As a result this should get merged in before the boundary condition fix.We can now restart with a number of processors different from the original run, and even restart serial from parallel (or vice versa) and still get identical answers.
I believe #61 is fixed with this PR.
Known issue:
PR Checklist