Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor start/end in the remote read request hints #8431

Merged
merged 5 commits into from
Jun 21, 2024
Merged

Conversation

pracucci
Copy link
Collaborator

@pracucci pracucci commented Jun 20, 2024

What this PR does

Currently, Mimir ignored remote read requests hints. When Prometheus issue a remote read request, the time range in hints may be different than the start/end time range. For example, if you run count_over_time(up[1h]) + count_over_time(up[2h]) (example query) in Prometheus you get 2 remote read requests with the following time ranges:

start: 1718859107525 (04:51:47) end: 1718869907525 (07:51:47) hints start: 1718862707525 (05:51:47) hints end: 1718869907525 (07:51:47)
start: 1718859107525 (04:51:47) end: 1718869907525 (07:51:47) hints start: 1718859107525 (04:51:47) hints end: 1718869907525 (07:51:47)

Apparently, the start/end is always the max one, and hints may specify a smaller one (apparently, it looks the opposite than a normal query execution where start/end should be the queried time range and hints the actual time range for which you want the raw data to process in PromQL).

Anyway, read hints are expected to be honored. Prometheus, which implements remote read endpoints too, honor the hints too.

In this PR I'm proposing to honoring read hints, but adding an extra protection for zero-valued hint values. The reason is that we don't know if in the while there are clients passing zero-valued hints (by mistake) and I would like to protect from it.

Which issue(s) this PR fixes or relates to

N/A

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@pracucci pracucci marked this pull request as ready for review June 20, 2024 14:32
@pracucci pracucci requested a review from a team as a code owner June 20, 2024 14:32
@krajorama krajorama self-requested a review June 21, 2024 05:43
CHANGELOG.md Outdated
@@ -70,6 +70,7 @@
* [ENHANCEMENT] Query-frontend: added `remote_read` to `op` supported label values for the `cortex_query_frontend_queries_total` metric. #8412
* [ENHANCEMENT] Query-frontend: log the overall length and start, end time offset from current time for remote read requests. The start and end times are calculated as the miminum and maximum times of the individual queries in the remote read request. #8404
* [ENHANCEMENT] Storage Provider: Added option `-<prefix>.s3.dualstack-enabled` that allows disabling S3 client from resolving AWS S3 endpoint into dual-stack IPv4/IPv6 endpoint. Defaults to true. #8405
* [ENHANCEMENT] Querier: honor the start/end time range specified in the read hints when executing a remote read request. #8431
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered it [CHANGE] ? Technically the result might be different in some cases. And I think we should call this out more in the release notes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't, but it's a fair comment. Done in 6612dd9

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing missing to make the story whole is to have e2e tests try it.
So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
… test

Signed-off-by: Marco Pracucci <marco@pracucci.com>
…tegration test

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Collaborator Author

I think the only thing missing to make the story whole is to have e2e tests try it. So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Integration tests were a bit tricky. The most complicated thing is that when you query chunks, the returned chunks are not cut to the queried time range but we return full chunks with samples within the queried time range. I've refactored TestQuerierStreamingRemoteRead and added a test case to query a subrange via hints.

Comment on lines +212 to +217
// Mimir doesn't cut returned chunks to the requested time range for performance reasons. This means
// that we get the entire chunks in output. TSDB targets to cut chunks once every 120 samples, but
// it's based on estimation and math is not super accurate. That's why we get these not-perfectly-aligned
// chunk ranges. However, they're consistent, so tests are stable.
expectedStartMs: now.Add(-1500 * time.Millisecond).UnixMilli(),
expectedEndMs: now.Add(10100 * time.Millisecond).UnixMilli(),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this is due to how TSDB computeChunkEndTime() works. I've confirmed it doing the math manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That code might change, so an alternative would be to make the start and end time much bigger than the hint start/end to contain all pushed samples and verify that the returned chunks contain the requested data , but not all data.
Anyway, it's not changing that often so it can be a different PR as well.

// Start dependencies.
minio := e2edb.NewMinio(9000, blocksBucketName)
// This test writes samples sparse in time. We don't want compaction to trigger while testing.
"-blocks-storage.tsdb.block-ranges-period": "2h",
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: the default in integration tests is 1m, but I don't want such a small block period range in these tests. Reason is that chunks are also cut to period ranges, making the assertion even more complicated.

Copy link
Contributor

@krajorama krajorama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Marco Pracucci <marco@pracucci.com>
@pracucci
Copy link
Collaborator Author

pracucci commented Jun 21, 2024

I think the only thing missing to make the story whole is to have e2e tests try it. So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Integration tests were a bit tricky. The most complicated thing is that when you query chunks, the returned chunks are not cut to the queried time range but we return full chunks with samples within the queried time range. I've refactored TestQuerierStreamingRemoteRead and added a test case to query a subrange via hints.

I reverted back TestQuerierRemoteRead to the main branch version one in a10436d. I need more time to fix it, so I will do in a separate PR. In the meanwhile I merge this PR to unlock this change.

@pracucci pracucci enabled auto-merge (squash) June 21, 2024 12:08
@pracucci pracucci merged commit 6bcba2c into main Jun 21, 2024
29 checks passed
@pracucci pracucci deleted the add-readhints-support branch June 21, 2024 12:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants