Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store-gateway: avoid unnecessarily writing lazy-loaded blocks snapshot #10740

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dimitarvdimitrov
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov commented Feb 25, 2025

Summary

This PR introduces two key optimizations to the indexheader.Snapshotter

  1. Checksum Mechanism: Added a checksum-based optimization to avoid persisting unchanged JSON data, reducing unnecessary IOPS. We're using sha256 because the linters think sha1 is not secure

  2. Simplified Snapshot Format: Changed the snapshot format from map[ulid.ULID]int64 to map[ulid.ULID]struct{}, eliminating unused timestamp data.

Motivation

In clusters with many tenants, the Snapshotter component can generate significant disk I/O by repeatedly writing identical JSON data to disk. This is particularly problematic on systems with low-performance disks. Additionally, the previous implementation stored timestamps that were never actually used in practice.

Compatibility Considerations

Backward Compatibility

This approach maintains backward compatibility. RestoreLoadedBlocks() still reads the old format with timestamps and converts it to the new format. The JSON structure remains the same. This ensures that existing snapshots continue to work without requiring any migration.

Forward Compatibility

Forward compatibility is maintained because we continue to write the same JSON structure, just with simplified content. The indexHeaderLastUsedTime field is still present in the JSON, ensuring that older code can still parse the structure. Since older code only cares about the keys (block IDs) and not the values (timestamps), the change in value type doesn't affect functionality.

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

### Summary

This PR introduces two key optimizations to the `indexheader.Snapshotter` component:SHA-1 Checksum Mechanism: Added a checksum-based optimization to avoid persisting unchanged JSON data, reducing unnecessary IOPS.

Simplified Snapshot Format: Changed the snapshot format from `map[ulid.ULID]int64` to `map[ulid.ULID]struct{}`, eliminating unused timestamp data.

### Motivation

In clusters with many tenants, the `Snapshotter` component can generate significant disk I/O by repeatedly writing identical JSON data to disk. This is particularly problematic on systems with low-performance disks. Additionally, the previous implementation stored timestamps that were never actually used in practice.

### Compatibility Considerations

#### Backward Compatibility

This approach maintains backward compatibility. `RestoreLoadedBlocks()`  still reads the old format with timestamps and converts it to the new format. The JSON structure remains the same. This ensures that existing snapshots continue to work without requiring any migration.

### Forward Compatibility

Forward compatibility is maintained because we continue to write the same JSON structure, just with simplified content. The `indexHeaderLastUsedTime` field is still present in the JSON, ensuring that older code can still parse the structure. Since older code only cares about the keys (block IDs) and not the values (timestamps), the change in value type doesn't affect functionality.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
@dimitarvdimitrov dimitarvdimitrov requested a review from a team as a code owner February 25, 2025 17:50
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Copy link
Contributor

@charleskorn charleskorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes generally LGTM.

Do we need to preserve backwards compatibility? Seems like we could reduce some complexity by making a backwards incompatible change to store only the list of loaded blocks. The only downside to this that I see is that store-gateways will lazy-load all blocks when they first start after upgrading to this, but that seems acceptable to me if it gives us long-term simplicity.

@@ -152,16 +152,15 @@ func (p *ReaderPool) onLazyReaderClosed(r *LazyBinaryReader) {
delete(p.lazyReaders, r)
}

// LoadedBlocks returns a new map of lazy-loaded block IDs and the last time they were used in milliseconds.
func (p *ReaderPool) LoadedBlocks() map[ulid.ULID]int64 {
func (p *ReaderPool) LoadedBlocks() map[ulid.ULID]struct{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just return a slice here? The only place that calls this is Snapshotter.PersistLoadedBlocks, and it doesn't need a map.

Copy link
Contributor

@narqo narqo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 Works for me.

It's the change a mimir user should see the effect of, right? Do you want to mention this change in the changelog, maybe?

@narqo
Copy link
Contributor

narqo commented Feb 26, 2025

Do we need to preserve backwards compatibility?

I suspect, the backwards (and forwards compatibility) here is important, because, otherwise, this will hurt the rollout. I.e. in a busy cell, where queries actively touch older blocks, not loading the previous snapshots will cause the delays of the runtime queries. The same is for a hypothetical scenario, where we needed to roll back the change.

@dimitarvdimitrov
Copy link
Contributor Author

Do we need to preserve backwards compatibility?

my thinking was the same as @narqo. I see two approaches if we don't keep compatibility

  1. Eager load all blocks. We've seen rollouts of a single zone take multiple hours if lazy-loading is disabled.
  2. Don't eager load any blocks. This will cause a backup of queries during/after the rollout, which I also don't like because it will cause errors, page people, miss rule evals, etc.

Maybe a middle-ground is to support the previous format to read from, but not allow for rollbacks. That is - we read the old format, but from now on only write the new format. That way in a future release we can delete the support for the old format and drop this complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants