Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pageserver): add fast path for sparse keyspace read #9631

Merged
merged 2 commits into from
Nov 6, 2024

Conversation

skyzh
Copy link
Member

@skyzh skyzh commented Nov 4, 2024

Problem

In #9441, the tenant has a lot of aux keys spread in multiple aux files. The perf tool shows that a significant amount of time is spent on remove_overlapping_keys. For sparse keyspaces, we don't need to report missing key errors anyways, and it's very likely that we will need to read all layers intersecting with the key range. Therefore, this patch adds a new fast path for sparse keyspace reads that we do not track unmapped_keyspace in a fine-grained way. We only modify it when we find an image layer.

In debug mode, it was ~5min to read the aux files for a dump of the tenant, and now it's only 8s, that's a 60x speedup.

Summary of changes

  • Do not add sparse keys into keys_done so that remove_overlapping does nothing.
  • Allow ValueReconstructSituation::Complete to be updated again in ValuesReconstructState::update_key for sparse keyspaces.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@skyzh
Copy link
Member Author

skyzh commented Nov 4, 2024

would like to know if @VladLazar has any opinion about, thanks :)

Copy link

github-actions bot commented Nov 4, 2024

5328 tests run: 5106 passed, 0 failed, 222 skipped (full report)


Flaky tests (2)

Postgres 17

Code coverage* (full report)

  • functions: 31.6% (7834 of 24754 functions)
  • lines: 49.2% (61790 of 125491 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5d7cb28 at 2024-11-06T16:22:44.580Z :recycle:

@skyzh skyzh force-pushed the skyzh/read-aux-fast branch from 9ae5e94 to 401063a Compare November 5, 2024 21:15
@skyzh skyzh marked this pull request as ready for review November 5, 2024 21:16
@skyzh skyzh requested a review from a team as a code owner November 5, 2024 21:16
@skyzh skyzh requested a review from erikgrinaker November 5, 2024 21:16
Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh skyzh force-pushed the skyzh/read-aux-fast branch from 401063a to f68dc0a Compare November 5, 2024 21:16
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice speedup! I'll let @VladLazar take a look as well.

This special case is worth a mention in the method comment, and any other relevant methods that call into it.

Given the performance boost, this might indicate further optimization opportunities in remove_overlapping_with() in the non-sparse case. Did you explore any other optimizations?

Copy link
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I agree with Erik that we should have a comment somewhere that lists all the special handling we do for AUX keys on the read path.

pageserver/src/tenant/storage_layer.rs Outdated Show resolved Hide resolved
@VladLazar
Copy link
Contributor

Given the performance boost, this might indicate further optimization opportunities in remove_overlapping_with() in the non-sparse case. Did you explore any other optimizations?

The issue here is that the aux keyspace is huge. On non-aux queries, the keyspaces in question are much smaller (max 16 keys per vectored read), so I doubt too much time is being spent in remove_overlapping_with() there.

Signed-off-by: Alex Chi Z <chi@neon.tech>
@skyzh
Copy link
Member Author

skyzh commented Nov 6, 2024

I'll also update the RFC to include this change

@skyzh skyzh enabled auto-merge (squash) November 6, 2024 15:17
@skyzh skyzh merged commit 1d3559d into main Nov 6, 2024
80 checks passed
@skyzh skyzh deleted the skyzh/read-aux-fast branch November 6, 2024 18:17
skyzh added a commit that referenced this pull request Nov 7, 2024
ref #9441

The metrics from LR publisher testing project: ~300KB aux key deltas per
256MB files. Therefore, I think we can do compaction more aggressively
as these deltas are small and compaction can reduce layer download
latency. We also have a read path perf fix
#9631 but I'd still combine the
read path fix with the reduce of the compaction threshold.

## Summary of changes

* reduce metadata compaction threshold
* use num of L1 delta layers as an indicator for metadata compaction
* dump more logs

Signed-off-by: Alex Chi Z <chi@neon.tech>
VladLazar added a commit that referenced this pull request Nov 11, 2024
skyzh added a commit that referenced this pull request Nov 11, 2024
Reflects #9631 in the RFC.

Signed-off-by: Alex Chi Z <chi@neon.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants