-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query-frontend: don't retry queries which error inside PromQL #4643
Conversation
So it can be mocked for testing.
Cancellations would get caught at the top of the loop, but it's better to skip the error log line.
4b078d2
to
7b14ec6
Compare
The change to not log "context canceled" as an error in the querier actually broke query-frontend, so I have removed it from this PR and moved to #4648. |
util_log "github.com/grafana/mimir/pkg/util/log" | ||
) | ||
|
||
type intObservation interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but should be:
type intObservation interface { | |
type intObserver interface { |
type retryMiddlewareMetrics struct { | ||
retriesCount prometheus.Histogram | ||
} | ||
|
||
func newRetryMiddlewareMetrics(registerer prometheus.Registerer) *retryMiddlewareMetrics { | ||
func newRetryMiddlewareMetrics(registerer prometheus.Registerer) intObservation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the value in this, I think we could perfectly use prometheus.Observer
inteface and call float64
casting in all (1?) places.
@colega I have changed as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, please consider my comment.
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
What this PR does
Break out of the retry loop if the error has been decoded from a Prometheus error, or is a cancellation.
Extra drive-by fix: don't log "context canceled" as an error in the querier. This is noticeable when runningTestQueryFrontendErrorMessageParity/query_time_range_exceeds_the_limit
, for instance.Which issue(s) this PR fixes or relates to
Fixes #4637
Checklist
CHANGELOG.md
updated