-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not fail as consistency check for canceled requests #3837
Do not fail as consistency check for canceled requests #3837
Conversation
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job!
@@ -718,6 +718,10 @@ func (q *blocksStoreQuerier) fetchSeriesFromStores( | |||
|
|||
stream, err := c.Series(gCtx, req) | |||
if err != nil { | |||
if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a stream, would it also make sense to handle the cancellations when receiving from the stream as well?
mimir/pkg/querier/blocks_store_queryable.go
Line 741 in ddf6d54
resp, err := stream.Recv() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be covered by the line above:
mimir/pkg/querier/blocks_store_queryable.go
Lines 735 to 739 in ddf6d54
// Ensure the context hasn't been canceled in the meanwhile (eg. an error occurred | |
// in another goroutine). | |
if gCtx.Err() != nil { | |
return gCtx.Err() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about the store-gateway streaming implementation - most of the time in the sotre-gateway is spent waiting to load series and chunks. And much less sending to the querier. To me this means that most of the time in the querier will be spent waiting for a store-gateway response. So the context cancellation is more likely to occur while the querier is waiting on a response from the store-gateway.
unless i'm missing something i can open a PR for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the context cancellation is more likely to occur while the querier is waiting on a response from the store-gateway.
True, but you will still catch it right after the next message is received from gRPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand. I thought srv.Recv()
will return a contect.Cancelled
error here and we will just swallow it
mimir/pkg/querier/blocks_store_queryable.go
Lines 741 to 748 in 6fca816
resp, err := stream.Recv() | |
if errors.Is(err, io.EOF) { | |
break | |
} | |
if err != nil { | |
level.Warn(spanLog).Log("msg", "failed to receive series", "remote", c.RemoteAddress(), "err", err) | |
return nil | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see what you mean and you're right. I misunderstood your previous message. We need to fix it and happy if you can work on it, as you proposed above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're definitely right @dimitarvdimitrov, good catch.
What this PR does
Stops ignoring the
context.Canceled
error from store gateway interaction: this made the entire request fail as "consistency check" because that store gateway didn't have the block, isntead of actually failing ascontext.Canceled
which isn't a server's fault (usually).Which issue(s) this PR fixes or relates to
Fixes internal.
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]