Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2033: lb: rewrite and fix bugs in OfflineLB/LBDataRestartReader #2034

Merged
merged 9 commits into from
Jan 24, 2023

Conversation

lifflander
Copy link
Collaborator

Fixes #2033

@lifflander lifflander linked an issue Dec 6, 2022 that may be closed by this pull request
@lifflander
Copy link
Collaborator Author

Tests are coming.

@lifflander lifflander force-pushed the 2033-simplify-and-fix-offlinelb branch from 7042e87 to 301e21d Compare December 7, 2022 01:56
Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to make this more general. Please see the comment below.

tests/unit/lb/test_offlinelb.cc Show resolved Hide resolved
@lifflander lifflander force-pushed the 2033-simplify-and-fix-offlinelb branch from 301e21d to a82b78c Compare January 12, 2023 23:38
@codecov
Copy link

codecov bot commented Jan 12, 2023

Codecov Report

Merging #2034 (356a356) into develop (54b4ebd) will increase coverage by 0.11%.
The diff coverage is 95.23%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2034      +/-   ##
===========================================
+ Coverage    84.75%   84.86%   +0.11%     
===========================================
  Files          722      723       +1     
  Lines        25600    25549      -51     
===========================================
- Hits         21696    21683      -13     
+ Misses        3904     3866      -38     
Impacted Files Coverage Δ
...t/vrt/collection/balance/lb_data_restart_reader.cc 76.59% <83.33%> (-1.98%) ⬇️
...vt/vrt/collection/balance/lb_data_restart_reader.h 78.57% <90.00%> (+38.57%) ⬆️
src/vt/vrt/collection/balance/lb_data_holder.cc 89.26% <100.00%> (-0.08%) ⬇️
.../vt/vrt/collection/balance/lb_invoke/lb_manager.cc 80.86% <100.00%> (+0.59%) ⬆️
src/vt/vrt/collection/balance/model/raw_data.cc 100.00% <100.00%> (ø)
...c/vt/vrt/collection/balance/offlinelb/offlinelb.cc 100.00% <100.00%> (+100.00%) ⬆️
tests/unit/lb/test_offlinelb.cc 100.00% <100.00%> (ø)
src/vt/phase/phase_manager.cc 91.89% <0.00%> (-0.68%) ⬇️
src/vt/vrt/collection/balance/baselb/baselb_msgs.h 100.00% <0.00%> (ø)
...rc/vt/collective/reduce/operators/functors/or_op.h 100.00% <0.00%> (ø)
... and 8 more

@lifflander lifflander force-pushed the 2033-simplify-and-fix-offlinelb branch from 9a187c0 to 9bace2f Compare January 18, 2023 18:11
for (PhaseType phase = 0; phase < num_phases_; phase++) {
auto iter = lbdh.node_data_.find(phase);
if (iter == lbdh.node_data_.end()) {
history_[phase] = history_[prev_known_phase];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting solution to the problem. Could we end up using too much memory this way? My original thought was that the LB interval would likely not be 1, so only a subset of the phases would be needed anyway. I think it's also possible to mix OfflineLB with another load balancer (on different phases) by using an LB config file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should only store the phases for which the LB configuration would use OfflineLB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it uses more memory than necessary. It was just the easiest way to write the code. I can do a search each time instead that will find the last previous known phase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok based on our conversation, for the sake of this PR, we are going to assume that the phase specification is dense is fully specified for every phase up to num_phases.

@lifflander lifflander force-pushed the 2033-simplify-and-fix-offlinelb branch from 9bace2f to 4905eda Compare January 23, 2023 21:59
@lifflander lifflander requested a review from nlslatt January 23, 2023 22:07
Copy link
Collaborator

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@lifflander lifflander merged commit c0e0282 into develop Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simplify and fix OfflineLB
2 participants