Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix heap update operation for least-loaded picker #61

Merged
merged 3 commits into from
Apr 19, 2024

Conversation

jhump
Copy link
Member

@jhump jhump commented Apr 19, 2024

@jchadwick-buf, @lrewega was trying out the least-loaded picker and it panick'ed! 😱

It turns out, the update operation -- which reconciles the heap with a new set of connections provided from a resolve -- was iterating from start to finish of the slice, but it would also change the length of the slice while iterating, when it decided it needed to pop an item that should no longer be present.

So the panic was an out-of-bounds slice index. But the iteration is incorrect for more than just that reason: removing an item from the heap that way will also re-order items, to sift things up and down to preserve heap invariants within the slice. So to proceed iterating through the slice means we might visit the same item more than once and fail to visit some items, as their order may have changed underneath us.

So now the logic does a single pass through the slice to remove unneeded items, by simply overwriting them (and setting to nil if necessary). So at the end of the first pass, all that is left in the slice are the items in the new set of connections, compacted to the beginning of the slice (everything after is set to nil). Then we append any new connections. And at the very end, we re-heapify to restore heap invariants after all of that.

This adds a test that is hopefully pretty convincing that it all works correctly now. The test is a sequence of operations, including acquiring connections, releasing them, and updating the resolved set.

@@ -86,23 +86,20 @@ type leastLoadedConnHeap []*leastLoadedConnItem
type leastLoadedConnItem struct {
conn conn.Conn
load uint64
tiebreak uint64
tieBreak uint64
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have nextTieBreak with a capital B elsewhere, so I capitalized it here for consistency that this represents two words.

entry := (*h)[0]
entry.load++
entry.tieBreak = nextTieBreak
Copy link
Member Author

@jhump jhump Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be doing this here, instead of outside of this function (as was done before), so that the Fix call below takes this value into consideration.

Copy link
Member

@jchadwick-buf jchadwick-buf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch, nasty bug. Very nice looking test though, that definitely seems rather convincing. To my eyes the explanation makes sense and there is nothing obviously wrong with the code, so LGTM. That said, it does make me wish we could get a bit more battle-testing on this (Aside from production adoption, I'd really like to see if we could sneak in some load tests in CI.) I guess that's why we're still 0.x for now.

@jhump jhump merged commit 47d0968 into main Apr 19, 2024
5 checks passed
@jhump jhump deleted the jh/fix-least-loaded-heap branch April 19, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants