Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disk weight ensemble infinite loop bug #4324

Conversation

hangc0276
Copy link
Contributor

Motivation

In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop.

protected List<BookieNode> selectRandomInternal(List<BookieNode> bookiesToSelectFrom,
int numBookies,
Set<Node> excludeBookies,
Predicate<BookieNode> predicate,
Ensemble<BookieNode> ensemble)

The root cause is the newly selected bookies already exist in the ensemble list, leading the ensemble.addNode(bookie) method to return false and the while loop continues.

In the next round of the loop, bookiesSeenSoFar.size() will never equal to bookiesToSelectFrom.size() due to the wRSelection set doesn't exclude the bookies in the ensemble set.

if (bookiesSeenSoFar.size() == bookiesToSelectFrom.size()) {
// If we have gone through the whole available list of bookies,
// and yet haven't been able to satisfy the ensemble request, bail out.
// We don't want to loop infinitely.
break;
}
bookie = wRSelection.getNextRandom();
bookiesSeenSoFar.add(bookie);

Changes

Remove the following code to make sure the items in bookiesToSelectFrom are all included in wRSelection

(Describe: what changes you have made)

Master Issue: #


In order to uphold a high standard for quality for code contributions, Apache BookKeeper runs various precommit
checks for pull requests. A pull request can only be merged when it passes precommit checks.


Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

If this PR is a BookKeeper Proposal (BP):

  • Make sure the PR title is formatted like:
    <BP-#>: Description of bookkeeper proposal
    e.g. BP-1: 64 bits ledger is support
  • Attach the master issue link in the description of this PR.
  • Attach the google doc link if the BP is written in Google Doc.

Otherwise:

  • Make sure the PR title is formatted like:
    <Issue #>: Description of pull request
    e.g. Issue 123: Description ...
  • Make sure tests pass via mvn clean apache-rat:check install spotbugs:check.
  • Replace <Issue #> in the title with the actual Issue number.

Copy link
Member

@wenbingshen wenbingshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!
Leave a minor comment:

Copy link
Member

@shoothzj shoothzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping input param excludeBookies and remove this if clause is a little werid and confusing. cc @wenbingshen @dlg99 @eolivelli

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch.

//
}

conf.setDiskWeightBasedPlacementEnabled(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would better reset StaticDNSResolver at the test end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the setUp method, it will reset the StaticDNSResolver. So we do not need to reset it in the end of the test.

Copy link
Member

@wenbingshen wenbingshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@shoothzj shoothzj merged commit a51f94d into apache:master May 11, 2024
21 checks passed
hangc0276 added a commit that referenced this pull request May 13, 2024
### Motivation

In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777

The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841

In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822

### Changes
Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection`
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797

(cherry picked from commit a51f94d)
hangc0276 added a commit that referenced this pull request May 13, 2024
### Motivation

In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777

The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841

In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822

### Changes
Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection`
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797

(cherry picked from commit a51f94d)
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
### Motivation

In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop. 
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777

The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841

In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set.
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822

### Changes
Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection` 
https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants