Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve compactor job estimation by tracking external labels in bucket index #7745

Merged
merged 25 commits into from
Apr 3, 2024

Conversation

seizethedave
Copy link
Contributor

@seizethedave seizethedave commented Mar 28, 2024

What this PR does

This PR adds a Labels field to the bucket index's Block struct. Compactor now maintains this field in the bucket index when it writes a block. This allows job estimation metrics (cortex_bucket_index_estimated_compaction_jobs) and tools/compaction-planner to better reflect outstanding compaction jobs because the compactor is only going to compact blocks that have the same external labels.

As we're now copying external block labels into the bucket index, this PR also makes the estimation match compactor's job planning by ignoring the deprecated __org_id__ and __ingester_id__ labels.

While the bucket index may have information about many blocks, each block today will only have <= 2 external labels, so concerns about blowing up the bucket index are minimal for now. But a close eye will need to be kept if any new labels are to be added later.

This estimation will remain inaccurate for blocks with external labels that aren't yet present in the bucket index. I believe this will stick around until those blocks are recompacted or retentioned out.

Which issue(s) this PR fixes or relates to

Fixes #7318

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@seizethedave seizethedave marked this pull request as ready for review March 28, 2024 22:33
@seizethedave seizethedave requested a review from a team as a code owner March 28, 2024 22:33
@seizethedave
Copy link
Contributor Author

I welcome a review from anyone, but I will be waiting for @pstibrany to approve.

@seizethedave seizethedave marked this pull request as draft March 29, 2024 16:33
@seizethedave seizethedave removed the request for review from pstibrany March 29, 2024 16:34
Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work on this PR. Changes look good to me, I only left some minor comments.

This estimation will remain inaccurate for blocks with external labels that aren't yet present in the bucket index. I believe this will stick around until those blocks are recompacted or retentioned out.

Correct. Bucket index updater code will not update blocks that are already in the index, but don't have their labels stored. For that we would need to increase bucket version, then bucket index updater would refetch all meta.json files and fix labels for all blocks. That would be a one-time operation for every tenant.

Given the low impact of the problem we're fixing by this PR, we may not want to do that in this PR, and live with wrong estimated jobs count until all blocks are naturally fixed.

pkg/compactor/blocks_cleaner_test.go Outdated Show resolved Hide resolved
pkg/compactor/compactor.go Outdated Show resolved Hide resolved
pkg/compactor/compactor.go Outdated Show resolved Hide resolved
@seizethedave seizethedave marked this pull request as ready for review April 2, 2024 17:25
@seizethedave
Copy link
Contributor Author

Given the low impact of the problem we're fixing by this PR, we may not want to do that in this PR, and live with wrong estimated jobs count until all blocks are naturally fixed.

That sounds rational to me- especially as I understand the current counting problem isn't really causing mass confusion.

Copy link
Member

@pstibrany pstibrany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@pstibrany pstibrany merged commit 9215d7a into main Apr 3, 2024
29 checks passed
@pstibrany pstibrany deleted the davidgrant/compactor-est branch April 3, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect value of estimated compaction jobs when out-of-order blocks have special label.
2 participants