-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store-gateway: fix loading of series with chunks in multiple segment files #5875
store-gateway: fix loading of series with chunks in multiple segment files #5875
Conversation
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very subtle bug! Nice work, LGTM!
@@ -1012,7 +1012,7 @@ func (s *loadingSeriesChunkRefsSetIterator) symbolizedSet(ctx context.Context, p | |||
} | |||
case !s.strategy.isNoChunkRefs(): | |||
clampLastChunkLength(symbolizedSet.series, metas) | |||
series.chunksRanges = metasToRanges([][]chunks.Meta{metas}, s.blockID, s.minTime, s.maxTime) | |||
series.chunksRanges = metasToRanges(partitionChunks(metas, 1, 1), s.blockID, s.minTime, s.maxTime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[for another PR] We can simplify partitionChunks()
to not take targetNumRanges, minChunksPerRange
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to remove partitionChunks and chunks ranges altogether in a future PR. I wanted to do that this week, but this bug took too much time to find and fix, so it will probably roll into next week. Should I still simplify partitionChunks before that its removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I still simplify partitionChunks before that its removal?
No :)
…files (#5875) * Add test for series with chunks in multiple segment files Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Partition chunks from multiple segment files into multiple ranges Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove redundant upload Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove test tool Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> --------- Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> (cherry picked from commit 85be770)
…files (#5875) * Add test for series with chunks in multiple segment files Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Partition chunks from multiple segment files into multiple ranges Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove redundant upload Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove test tool Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> --------- Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> (cherry picked from commit 85be770)
…ltiple segment files (#5891) * store-gateway: fix loading of series with chunks in multiple segment files (#5875) * Add test for series with chunks in multiple segment files Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Partition chunks from multiple segment files into multiple ranges Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove redundant upload Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Remove test tool Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> --------- Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> (cherry picked from commit 85be770) * Update changelog entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update changelog entry Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> * Update CHANGELOG.md Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com> --------- Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Co-authored-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com> Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
The bug
Fixes a bug introduced in #5816 (part of r253 and 2.10.0-rc.0) where series with chunks in multiple segment files wouldn't be fetched properly. The cause of the bug was that after 5816 all chunks were in the same chunks range. Since the ID of the segment file is kept once per chunks range, this lead to having chunks from multiple segment files in the same chunks range. The chunks from the second segment file would b e fetched from the first segment file, leading to invalid offsets and overlapping chunks.
The fix
The fix was trivial - reintroduce the usage of
partitionChunks
which used to partition chunks into multiple ranges based on their segment files. This was the base before 5816 as well.The side effects in tests
We didn't catch this bug because we don't test large enough blocks that have multiple segment files and series that have multiple chunks. To introduce a test that checks these I had to introduce a few changes to the test setup and the assertions.
Previously the tests in series_refs_test.go would each assert on the chunks' MinTime, MaxTime and their reference. The correct values were found when first writing the tests. This limited how many chunks we can assert against (it's infeasible to hardcode the refs of 100K chunks); see the test cases commented with "There is still no easy way to assert on the refs of 100K chunks, so we skip them."
Now we run prometheus block query code to get the chunk MinTime, MaxTime, and references instead of hardcoding them in the tests. This allows to run with many more chunks and segment files and still do exact assertions. The tradeoff is that we now don't have that strict assertions on the chunk length estimation. I think this is ok, since the estimation doesn't play a huge role in chunks fetching and bugs in it aren't critical.
The changes in the tests will also make the future removals of fine-grained chunks caching easier.
Which issue(s) this PR fixes or relates to
Fixes #
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]