Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lookup context cancellation race condition #656

Merged
merged 1 commit into from
May 27, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions query.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,10 +117,12 @@ func (dht *IpfsDHT) runLookupWithFollowup(ctx context.Context, target string, qu
}

// wait for all queries to complete before returning, aborting ongoing queries if we've been externally stopped
followupsCompleted := 0
processFollowUp:
for i := 0; i < len(queryPeers); i++ {
select {
case <-doneCh:
followupsCompleted++
if stopFn() {
cancelFollowUp()
if i < len(queryPeers)-1 {
Expand All @@ -130,10 +132,17 @@ processFollowUp:
}
case <-ctx.Done():
lookupRes.completed = false
cancelFollowUp()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do we actually need this? The context should be canceled anyways.
  • On a related note, can we reliably detect that the query failed? If the context is canceled, we may read everything off the done channel before we check this.
  • It looks like we're treating completed inconsistently. Why are we marking the query as incomplete if the user explicitly stops it early?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this? The context should be canceled anyways.

You're right we shouldn't need that.

It looks like we're treating completed inconsistently. Why are we marking the query as incomplete if the user explicitly stops it early?

Completed is really more like "have we learned enough about the DHT from this query that we no longer need to refresh this bucket". In some ways this is a little weird because if our query targets a low cpl peer we probably don't need to worry about this in the followup, but if the query targets a high cpl peer we probably want to connect to all of them so we have the opportunity to add them to our routing table.

On a related note, can we reliably detect that the query failed? If the context is canceled, we may read everything off the done channel before we check this.

Probably if we rework things a little, although if we're concerned we could just check ctx.Done again at the end. If the query fails for any reason other than context cancellation then we're not worried since our job for the followup was to attempt to run the query against each of the K nodes that weren't yet queried, not to succeed. If we cancel the context though then we want to not mark the bucket as queried.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, LGTM for now. I'd prefer to just drain this channel then check to see if the context was canceled after we finish draining the channel, but this fixes the bug.

break processFollowUp
}
}

if !lookupRes.completed {
for i := followupsCompleted; i < len(queryPeers); i++ {
<-doneCh
}
}

return lookupRes, nil
}

Expand Down