-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix disk weight ensemble infinite loop bug #4324
Fix disk weight ensemble infinite loop bug #4324
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
Leave a minor comment:
...-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think keeping input param excludeBookies and remove this if clause is a little werid and confusing. cc @wenbingshen @dlg99 @eolivelli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch.
// | ||
} | ||
|
||
conf.setDiskWeightBasedPlacementEnabled(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would better reset StaticDNSResolver at the test end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the setUp method, it will reset the StaticDNSResolver. So we do not need to reset it in the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
### Motivation In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777 The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841 In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822 ### Changes Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection` https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797 (cherry picked from commit a51f94d)
### Motivation In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777 The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841 In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822 ### Changes Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection` https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797 (cherry picked from commit a51f94d)
### Motivation In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L773-L777 The root cause is the newly selected bookies already exist in the ensemble list, leading the `ensemble.addNode(bookie)` method to return `false` and the while loop continues. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L841 In the next round of the loop, `bookiesSeenSoFar.size()` will never equal to `bookiesToSelectFrom.size()` due to the wRSelection set doesn't exclude the bookies in the ensemble set. https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L815-L822 ### Changes Remove the following code to make sure the items in bookiesToSelectFrom are all included in `wRSelection` https://github.com/apache/bookkeeper/blob/3ed93a0342ccff8c30ea472d731a7b593b4d32b0/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L795-L797
Motivation
In the RackawareEnsemblePlacementPolicyImpl#selectRandomInternal method, if bookieNode in the ensemble list is not added to the excludeBookies set and the diskWeight feature is enabled, this method will fall into an infinite loop.
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java
Lines 773 to 777 in 3ed93a0
The root cause is the newly selected bookies already exist in the ensemble list, leading the
ensemble.addNode(bookie)
method to returnfalse
and the while loop continues.bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java
Line 841 in 3ed93a0
In the next round of the loop,
bookiesSeenSoFar.size()
will never equal tobookiesToSelectFrom.size()
due to the wRSelection set doesn't exclude the bookies in the ensemble set.bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java
Lines 815 to 822 in 3ed93a0
Changes
Remove the following code to make sure the items in bookiesToSelectFrom are all included in
wRSelection
bookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java
Lines 795 to 797 in 3ed93a0
(Describe: what changes you have made)
Master Issue: #