Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed #8039

charleskorn · 2024-05-03T03:27:22Z

What this PR does

Acknowledgement: this has been a joint effort between myself and @zenador

This PR improves the performance of store-gateways when streaming chunks to queriers.

Specifically, it improves the performance of the case where a Select() call selects no more than than -blocks-storage.bucket-store.batch-series-size series (default value is 5000).

Currently this case behaves as follows (assuming the index-cache is empty initially):

When a Select() call arrives, the store-gateway needs to first send all selected series' labels to the querier:
- It calls BucketStore.streamingSeriesForBlocks(), which creates an iterator for the seriesChunkRefsSets for the query. Each seriesChunkRefsSet contains at most -blocks-storage.bucket-store.batch-series-size series.
- As this iterator is consumed, the series in each seriesChunkRefsSet are loaded and cached as SeriesForRef items in the index-cache. These cache entries are written asynchronously.
Once all labels have been sent, it then begins streaming the chunks for those series:
- This calls BucketStore.streamingChunksSetForBlocks(), which creates the same iterator again (albeit this time asking for chunk refs to be loaded as well)
- As this iterator is consumed, it will try to fetch the cached series, and if a cache miss occurs, falls back to loading these directly as before.

This last part is the opportunity for improvement: it's very likely that the asynchronous cache write is yet to complete at this point, so the store-gateway will end up loading the series again. However, in the case where the query only needs one batch, recreating the batch is unnecessary: we can just keep the batch in memory and reuse it for both sending labels and sending chunks. (We can't do this if multiple batches are required: that would require keeping all batches in memory, which we are trying to avoid.)

⚠️ It has taken me quite a while for me to get my head around this part of the store-gateway - please review this change carefully, don't assume I know what I'm doing.

Which issue(s) this PR fixes or relates to

Fixes the issue discussed in #6646 (comment)

Checklist

Tests updated.
[n/a] Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
[n/a] about-versioning.md updated with experimental features.

…ed comments.

… and disabled

dimitarvdimitrov

Note to reviewers: I don't love how invasive this change was, open to ideas and feedback on how to better isolate this. Perhaps we could wrap loadingSeriesChunkRefsSetIterator in another iterator type that either caches the single set or recreates the iterator when we start streaming chunks?

I tend to agree. The logic for what is being reset (in streamingChunksSetForBlocks) and how the strategy and "releasability" change is selected (in loadingSeriesChunkRefsSetIterator.Reset()) are very far away in the call chain.

Maybe we can have something like an iterator factory: instead of letting getSeriesIteratorFromBlocks call directly openBlockSeriesChunkRefsSetsIterator we can pass a factory to getSeriesIteratorFromBlocks.

The first time getSeriesIteratorFromBlocks invokes the factory the factory creates empty iterators.
The factory can make the decision whether there will be only a single batch for the block or multiple (which due to sharding and labels filtering won't be accurate, but it might work most of the time).
If there is only a single batch, then the factory returns an iterator which returns unreleasable batch and caches the result.
In the second invocation from getSeriesIteratorFromBlocks the factory reuses the cached result and returns a releasable batch.
Done?

this way the factory and the caching iterator it returns will be the major place which decide short-term caching, releasability, and the seriesIteratorStrategy

pkg/storegateway/series_chunks.go

pkg/storegateway/series_refs.go

…etIterator`

… series are loaded

dimitarvdimitrov

nice one! I also like this approach better. it looks good for the most part, the only major thing is that the postings are no longer copied before reusing them. Can you double-check if that's causing bugs?

pkg/storegateway/bucket.go

pkg/storegateway/series_refs_streaming.go

pkg/storegateway/series_refs.go

pkg/storegateway/series_refs_streaming_test.go

…ng function handles around, remove more unnecessary comments

charleskorn · 2024-05-20T10:09:28Z

I have tested this in a test cluster at Grafana Labs and it seems to work fine, so I think this is ready for a final review before merging 🎉

dimitarvdimitrov

LGTM, awesome work @charleskorn and @zenador!

…lace.

…eways and only one batch is needed (grafana#8039) Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>

…6646) * Enable streaming chunks from store-gateways to queriers by default. # Conflicts: # cmd/mimir/help-all.txt.tmpl # docs/sources/mimir/configure/about-versioning.md # pkg/querier/querier.go * Add changelog entry. # Conflicts: # CHANGELOG.md * Fix failing TestQuerierWithBlocksStorageOnMissingBlocksFromStorage test with store-gateway chunks streaming enabled. * Partially fix cache-related assertions in TestQuerierWithBlocksStorageRunningInSingleBinaryMode * Address PR feedback, and add more details to failing assertions # Conflicts: # integration/querier_test.go * Update tests to reflect behaviour of chunks streaming with #8039 in place.

charleskorn and others added 15 commits March 19, 2024 11:28

Fix outdated comment

897d3b6

Add initial support for resetting loadingSeriesChunkRefsSetIterator

254b1db

Add important TODO

4f67084

add reset funcs in preparation but don't use them yet

3fa6e3f

reuse the iterator, this builds but fails tests with EOF

05e6274

try changing strategy up front

a15a25e

Correctly reset merging and deduplicating iterators

17c8360

Avoid counting series and chunks twice in limiting iterator

2473ca1

Don't allow releasing series chunk refs set that will be reused

4f4d20b

Restore previous test setup

1542ebe

Fix typo in variable name

09ddd2e

Restore more previous test setup

fb1322b

Restore use of noChunkRefs where it should be used, and remove outdat…

e1eadbd

…ed comments.

Fix linting warning.

676349d

Introduce seriesIteratorStrategy flag for chunks streaming

c3fb275

charleskorn changed the title ~~Don't load series refs multiple times when streaming chunks from store-gateways and only one batch is needed~~ Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed May 3, 2024

Refactor test to exercise case where chunks streaming is both enabled…

8b81435

… and disabled

charleskorn closed this May 6, 2024

Return reused series set to pool

4471f33

charleskorn reopened this May 13, 2024

charleskorn and others added 5 commits May 13, 2024 12:27

Fix typo

afebea2

Only load chunk refs when they're needed.

5915c19

Rename variable to make intent clearer

f2dccc7

Add changelog entry

103615e

Merge branch 'main' into charleskorn/store-gateway-double-work-fix

73def47

charleskorn requested review from zenador and dimitarvdimitrov May 13, 2024 04:17

dimitarvdimitrov reviewed May 13, 2024

View reviewed changes

charleskorn added 2 commits May 14, 2024 15:29

Rename withoutNoChunkRefs to withChunkRefs

9f40abc

Remove redundant if

3c8e50f

charleskorn added 3 commits May 15, 2024 16:14

Refactor existing code to use `chunksStreamingCachingSeriesChunkRefsS…

77ac30b

…etIterator`

Fix issue where streaming chunks does not work if multiple batches of…

43115a9

… series are loaded

Rename methods to better reflect their purpose

3c5eefe

charleskorn requested a review from dimitarvdimitrov May 15, 2024 06:38

dimitarvdimitrov reviewed May 15, 2024

View reviewed changes

charleskorn and others added 12 commits May 20, 2024 12:29

Move streamingSeriesIterators to series_refs_streaming.go

93b6d9f

Remove unnecessary comments

c92b9a3

Remove unnecessary test helper function

6ca5bc9

Introduce seriesChunkRefsIteratorWrapper interface instead of passi…

17bb03c

…ng function handles around, remove more unnecessary comments

Rename type

81b1975

Make a copy of the postings for each phase.

60f3084

Clarify comment

c0fb6e9

Use concrete type

e3d16b1

Merge branch 'main' into charleskorn/store-gateway-double-work-fix

57c4b6b

Don't make a copy of the postings unnecessarily.

e711d49

Fix linting issues

b2f7f15

Remove unused method

1af591f

charleskorn marked this pull request as ready for review May 20, 2024 10:08

charleskorn requested a review from a team as a code owner May 20, 2024 10:08

charleskorn requested a review from dimitarvdimitrov May 21, 2024 00:53

dimitarvdimitrov approved these changes May 23, 2024

View reviewed changes

dimitarvdimitrov merged commit 2a0da30 into main May 23, 2024
29 checks passed

dimitarvdimitrov deleted the charleskorn/store-gateway-double-work-fix branch May 23, 2024 18:28

charleskorn mentioned this pull request May 28, 2024

Enable streaming chunks from store-gateways to queriers by default #6646

Merged

2 tasks

charleskorn added a commit that referenced this pull request May 28, 2024

Update tests to reflect behaviour of chunks streaming with #8039 in p…

95ae131

…lace.

charleskorn added a commit that referenced this pull request May 28, 2024

Update tests to reflect behaviour of chunks streaming with #8039 in p…

91646ed

…lace.

narqo pushed a commit to narqo/grafana-mimir that referenced this pull request Jun 6, 2024

Don't load series multiple times when streaming chunks from store-gat…

05e9c4c

…eways and only one batch is needed (grafana#8039) Co-authored-by: Jeanette Tan <jeanette.tan@grafana.com>

charleskorn mentioned this pull request Jul 14, 2024

querier, store-gateway: promote chunks streaming to stable #8696

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed #8039

Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed #8039

charleskorn commented May 3, 2024 •

edited

Loading

dimitarvdimitrov left a comment

dimitarvdimitrov left a comment

charleskorn commented May 20, 2024

dimitarvdimitrov left a comment

Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed #8039

Don't load series multiple times when streaming chunks from store-gateways and only one batch is needed #8039

Conversation

charleskorn commented May 3, 2024 • edited Loading

What this PR does

Which issue(s) this PR fixes or relates to

Checklist

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

charleskorn commented May 20, 2024

dimitarvdimitrov left a comment

Choose a reason for hiding this comment

charleskorn commented May 3, 2024 •

edited

Loading