Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: TSDB: panic in query during truncation with OOO head #14831

Merged
merged 5 commits into from
Sep 5, 2024

Conversation

krajorama
Copy link
Member

@krajorama krajorama commented Sep 5, 2024

Added regression test for #14822. Doesn't cause segfault before #14354

The segfault was due to a race condition between query start and compaction.
When compaction starts, in-order queries may overlap with the TSDB head, but also might fall into the truncated time of the head. In such case, the head querier headQuerier is nil in db.go here

headQuerier = nil

and
headQuerier = nil

That pointer is not used for selecting samples, but is referenced in Close() which causes the segfault.

The fix essentially restores the original function where we did not rely on the headQuerier in creating the OOO head querier:

rh := NewOOORangeHead(db.head, mint, maxt, db.lastGarbageCollectedMmapRef)

Regression test for prometheus#14822

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
krajorama added a commit that referenced this pull request Sep 5, 2024
Ref: #14831

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
krajorama and others added 2 commits September 5, 2024 11:24
Attempted fix

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
@krajorama krajorama force-pushed the fix-panic-in-ooo-query branch 2 times, most recently from 356e519 to 8a39690 Compare September 5, 2024 10:57
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama krajorama marked this pull request as ready for review September 5, 2024 11:11
@@ -513,7 +513,7 @@ type HeadAndOOOQuerier struct {
head *Head
index IndexReader
chunkr ChunkReader
querier storage.Querier
querier storage.Querier // This might be nil if head was truncated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I advise to state what the thing means, not when you expect it to apply.
So "If nil, do not read from in-order head"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarified

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
@krajorama
Copy link
Member Author

cc PTAL @colega

Copy link
Member

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I can see how you are preventing new panics and how data is still coming from the blocks in the test. Nice work!

@bboreham bboreham merged commit 536d9f9 into prometheus:main Sep 5, 2024
26 checks passed
krajorama added a commit to krajorama/prometheus that referenced this pull request Sep 9, 2024
Followup to prometheus#14831

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants