You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recent PR #4281 has introduced new metrics for tracking errors from ruler. Idea was that these failure metrics will only be incremented for queries that would normally result in 500 errors.
However this doesn't quite work. There is a class of errors that querier would return 422 for, but ruler still treats as internal errors. Examples include a rule that ends with multiple matches for labels: many-to-one matching must be explicit (group_left/group_right) errors, or a rule like label_replace(metric, "foo", "$1", "service", "[") with invalid regex "[".
Why do these return 422, but ruler doesn't recognize them as user-errors? Ruler uses querier.TranslateToPromqlAPIError to translate errors to PromQL errors. This function was designed to be used by querier, for returning errors from storage.Queryable implementation that match errors expected by Prometheus API library.
However in case of above mentioned class of errors, they don't "go" through the storage layer, but are generated by PromQL engine itself. As such, these errors are internal, but Prometheus API layer can translate them to proper status codes. Unfortunately Cortex ruler doesn't use Prometheus API library to run the queries, and cannot currently do type assertions on these internal Prometheus errors.
(By Prometheus API library/layer I mean github.com/prometheus/prometheus/web/api/v1 package that implements HTTP API in Prometheus.)
The text was updated successfully, but these errors were encountered:
Recent PR #4281 has introduced new metrics for tracking errors from ruler. Idea was that these failure metrics will only be incremented for queries that would normally result in 500 errors.
However this doesn't quite work. There is a class of errors that querier would return 422 for, but ruler still treats as internal errors. Examples include a rule that ends with
multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)
errors, or a rule likelabel_replace(metric, "foo", "$1", "service", "[")
with invalid regex"["
.Why do these return 422, but ruler doesn't recognize them as user-errors? Ruler uses
querier.TranslateToPromqlAPIError
to translate errors to PromQL errors. This function was designed to be used by querier, for returning errors fromstorage.Queryable
implementation that match errors expected by Prometheus API library.However in case of above mentioned class of errors, they don't "go" through the storage layer, but are generated by PromQL engine itself. As such, these errors are internal, but Prometheus API layer can translate them to proper status codes. Unfortunately Cortex ruler doesn't use Prometheus API library to run the queries, and cannot currently do type assertions on these internal Prometheus errors.
(By Prometheus API library/layer I mean
github.com/prometheus/prometheus/web/api/v1
package that implements HTTP API in Prometheus.)The text was updated successfully, but these errors were encountered: