Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total #3149

ipstatic · 2020-09-10T22:41:43Z

Thanos, Prometheus and Golang version used:
Thanos: v0.13.0
Prometheus: 2.15.2
Golang: 1.14.1

Object Storage Provider:
GCS

What happened:
We noticed a continued increases in the thanos_objstore_bucket_operation_failures_total metric while not seeing errors in the log file. After looking at other metrics it appears that when a request from query is timed out/cancelled/aborted, the thanos_objstore_bucket_operation_failures_total metric increases. We also confirmed this from the GCS side by seeing an increased rate of CANCELLED API calls around the times we noticed that the bucket operation failures metric increased.

What you expected to happen:
thanos_objstore_bucket_operation_failures_total not to increase.

How to reproduce it (as minimally and precisely as possible):
Run a large query from querier that will hit its query timeout.

Full logs to relevant components:
I can include logs if desired but they just show normal block caching operations. Even in debug mode there is nothing about a failure.

The text was updated successfully, but these errors were encountered:

GiedriusS · 2020-09-11T10:49:16Z

It seems to me like we need to add if !errors.Cause(err, context.Canceled) { increaseOperationFailures() }. Probably it would be even better if we'd check if the gRPC request has been aborted. Help wanted!

ipstatic · 2020-09-11T13:36:36Z

That is what is puzzling me. Query shows the request as cancelled but store shows the request as aborted, not cancelled. Is there a timer that we could be hitting? Also, I would love to help but I don't know where in the code this is getting executed. Mind pointing me in a general direction?

GiedriusS added component: store feature request/improvement labels Sep 11, 2020

GiedriusS added the difficulty: easy label Sep 11, 2020

ipstatic mentioned this issue Sep 16, 2020

Store: If request ctx has an error we do not increment opsFailures counter #3179

Merged

2 tasks

bwplotka closed this as completed in #3179 Sep 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total #3149

Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total #3149

ipstatic commented Sep 10, 2020

GiedriusS commented Sep 11, 2020 •

edited

Loading

ipstatic commented Sep 11, 2020

Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total #3149

Store: Cancelled/Aborted GRPC Requests Increment thanos_objstore_bucket_operation_failures_total #3149

Comments

ipstatic commented Sep 10, 2020

GiedriusS commented Sep 11, 2020 • edited Loading

ipstatic commented Sep 11, 2020

GiedriusS commented Sep 11, 2020 •

edited

Loading