Honor start/end in the remote read request hints #8431

pracucci · 2024-06-20T08:22:37Z

What this PR does

Currently, Mimir ignored remote read requests hints. When Prometheus issue a remote read request, the time range in hints may be different than the start/end time range. For example, if you run count_over_time(up[1h]) + count_over_time(up[2h]) (example query) in Prometheus you get 2 remote read requests with the following time ranges:

start: 1718859107525 (04:51:47) end: 1718869907525 (07:51:47) hints start: 1718862707525 (05:51:47) hints end: 1718869907525 (07:51:47)
start: 1718859107525 (04:51:47) end: 1718869907525 (07:51:47) hints start: 1718859107525 (04:51:47) hints end: 1718869907525 (07:51:47)

Apparently, the start/end is always the max one, and hints may specify a smaller one (apparently, it looks the opposite than a normal query execution where start/end should be the queried time range and hints the actual time range for which you want the raw data to process in PromQL).

Anyway, read hints are expected to be honored. Prometheus, which implements remote read endpoints too, honor the hints too.

In this PR I'm proposing to honoring read hints, but adding an extra protection for zero-valued hint values. The reason is that we don't know if in the while there are clients passing zero-valued hints (by mistake) and I would like to protect from it.

Which issue(s) this PR fixes or relates to

N/A

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

krajorama · 2024-06-21T05:49:30Z

CHANGELOG.md

@@ -70,6 +70,7 @@
 * [ENHANCEMENT] Query-frontend: added `remote_read` to `op` supported label values for the `cortex_query_frontend_queries_total` metric. #8412
 * [ENHANCEMENT] Query-frontend: log the overall length and start, end time offset from current time for remote read requests. The start and end times are calculated as the miminum and maximum times of the individual queries in the remote read request. #8404
 * [ENHANCEMENT] Storage Provider: Added option `-<prefix>.s3.dualstack-enabled` that allows disabling S3 client from resolving AWS S3 endpoint into dual-stack IPv4/IPv6 endpoint. Defaults to true. #8405
+* [ENHANCEMENT] Querier: honor the start/end time range specified in the read hints when executing a remote read request. #8431


Have you considered it [CHANGE] ? Technically the result might be different in some cases. And I think we should call this out more in the release notes.

I didn't, but it's a fair comment. Done in 6612dd9

krajorama

I think the only thing missing to make the story whole is to have e2e tests try it.
So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Signed-off-by: Marco Pracucci <marco@pracucci.com>

… test Signed-off-by: Marco Pracucci <marco@pracucci.com>

…tegration test Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2024-06-21T10:11:47Z

I think the only thing missing to make the story whole is to have e2e tests try it. So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Integration tests were a bit tricky. The most complicated thing is that when you query chunks, the returned chunks are not cut to the queried time range but we return full chunks with samples within the queried time range. I've refactored TestQuerierStreamingRemoteRead and added a test case to query a subrange via hints.

pracucci · 2024-06-21T10:12:50Z

integration/querier_remote_read_test.go

+			// Mimir doesn't cut returned chunks to the requested time range for performance reasons. This means
+			// that we get the entire chunks in output. TSDB targets to cut chunks once every 120 samples, but
+			// it's based on estimation and math is not super accurate. That's why we get these not-perfectly-aligned
+			// chunk ranges. However, they're consistent, so tests are stable.
+			expectedStartMs: now.Add(-1500 * time.Millisecond).UnixMilli(),
+			expectedEndMs:   now.Add(10100 * time.Millisecond).UnixMilli(),


Note to reviewers: this is due to how TSDB computeChunkEndTime() works. I've confirmed it doing the math manually.

That code might change, so an alternative would be to make the start and end time much bigger than the hint start/end to contain all pushed samples and verify that the returned chunks contain the requested data , but not all data.
Anyway, it's not changing that often so it can be a different PR as well.

pracucci · 2024-06-21T10:14:59Z

integration/querier_remote_read_test.go

-			// Start dependencies.
-			minio := e2edb.NewMinio(9000, blocksBucketName)
+		// This test writes samples sparse in time. We don't want compaction to trigger while testing.
+		"-blocks-storage.tsdb.block-ranges-period": "2h",


Note to reviewers: the default in integration tests is 1m, but I don't want such a small block period range in these tests. Reason is that chunks are also cut to period ranges, making the assertion even more complicated.

krajorama

LGTM

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci · 2024-06-21T12:07:23Z

I think the only thing missing to make the story whole is to have e2e tests try it. So enhance the interfaces RemoteRead and RemoteReadChunks be able to take an optional *prompb.ReadHints.

Integration tests were a bit tricky. The most complicated thing is that when you query chunks, the returned chunks are not cut to the queried time range but we return full chunks with samples within the queried time range. I've refactored TestQuerierStreamingRemoteRead and added a test case to query a subrange via hints.

I reverted back TestQuerierRemoteRead to the main branch version one in a10436d. I need more time to fix it, so I will do in a separate PR. In the meanwhile I merge this PR to unlock this change.

pracucci force-pushed the add-readhints-support branch from b7c069a to e839bc9 Compare June 20, 2024 14:32

pracucci marked this pull request as ready for review June 20, 2024 14:32

pracucci requested a review from a team as a code owner June 20, 2024 14:32

krajorama self-requested a review June 21, 2024 05:43

krajorama reviewed Jun 21, 2024

View reviewed changes

pracucci added 2 commits June 21, 2024 09:08

Honor start/end in the remote read request hints

cbb132a

Signed-off-by: Marco Pracucci <marco@pracucci.com>

Update CHANGELOG

6612dd9

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci force-pushed the add-readhints-support branch from e839bc9 to 6612dd9 Compare June 21, 2024 07:09

pracucci added 2 commits June 21, 2024 10:07

Add remote read hints test cases to TestQuerierRemoteRead integration…

c527637

… test Signed-off-by: Marco Pracucci <marco@pracucci.com>

Add remote read hints test cases to TestQuerierStreamingRemoteRead in…

cd307b4

…tegration test Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci commented Jun 21, 2024

View reviewed changes

krajorama approved these changes Jun 21, 2024

View reviewed changes

pracucci mentioned this pull request Jun 21, 2024

Fix timestamp reported in the "the sample has been rejected because its timestamp is too old" error #8462

Merged

4 tasks

Reverted TestQuerierRemoteRead to main branch

a10436d

Signed-off-by: Marco Pracucci <marco@pracucci.com>

pracucci enabled auto-merge (squash) June 21, 2024 12:08

pracucci merged commit 6bcba2c into main Jun 21, 2024
29 checks passed

pracucci deleted the add-readhints-support branch June 21, 2024 12:23

rishabhkumar92 mentioned this pull request Oct 29, 2024

Remote read API not performant in v2.13 #9764

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor start/end in the remote read request hints #8431

Honor start/end in the remote read request hints #8431

pracucci commented Jun 20, 2024 •

edited

Loading

krajorama Jun 21, 2024

pracucci Jun 21, 2024

krajorama left a comment

pracucci commented Jun 21, 2024

pracucci Jun 21, 2024

krajorama Jun 21, 2024

pracucci Jun 21, 2024

krajorama left a comment

pracucci commented Jun 21, 2024 •

edited

Loading

Honor start/end in the remote read request hints #8431

Honor start/end in the remote read request hints #8431

Conversation

pracucci commented Jun 20, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

krajorama Jun 21, 2024

Choose a reason for hiding this comment

pracucci Jun 21, 2024

Choose a reason for hiding this comment

krajorama left a comment

Choose a reason for hiding this comment

pracucci commented Jun 21, 2024

pracucci Jun 21, 2024

Choose a reason for hiding this comment

krajorama Jun 21, 2024

Choose a reason for hiding this comment

pracucci Jun 21, 2024

Choose a reason for hiding this comment

krajorama left a comment

Choose a reason for hiding this comment

pracucci commented Jun 21, 2024 • edited Loading

pracucci commented Jun 20, 2024 •

edited

Loading

pracucci commented Jun 21, 2024 •

edited

Loading