Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pageserver: only apply
ClearVmBits
on relevant shards (#9895)
# Problem VM (visibility map) pages are stored and managed as any regular relation page, in the VM fork of the main relation. They are also sharded like other pages. Regular WAL writes to the VM pages (typically performed by vacuum) are routed to the correct shard as usual. However, VM pages are also updated via `ClearVmBits` metadata records emitted when main relation pages are updated. These metadata records were sent to all shards, like other metadata records. This had the following effects: * On shards responsible for VM pages, the `ClearVmBits` applies as expected. * On shard 0, which knows about the VM relation and its size but doesn't necessarily have any VM pages, the `ClearVmBits` writes may have been applied without also having applied the explicit WAL writes to VM pages. * If VM pages are spread across multiple shards (unlikely with 256MB stripe size), all shards may have applied `ClearVmBits` if the pages fall within their local view of the relation size, even for pages they do not own. * On other shards, this caused a relation size cache miss and a DbDir and RelDir lookup before dropping the `ClearVmBits`. With many relations, this could cause significant CPU overhead. This is not believed to be a correctness problem, but this will be verified in #9914. Resolves #9855. # Changes Route `ClearVmBits` metadata records only to the shards responsible for the VM pages. Verification of the current VM handling and cleanup of incomplete VM pages on shard 0 (and potentially elsewhere) is left as follow-up work.
- Loading branch information
da1daa2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7053 tests run: 6719 passed, 1 failed, 333 skipped (full report)
Failures on Postgres 16
test_sharded_ingest[github-actions-selfhosted-vanilla-1]
: release-x86-64Flaky tests (3)
Postgres 17
test_subscriber_synchronous_commit
: release-x86-64test_timeline_archive[4]
: debug-x86-64Postgres 14
test_pull_timeline[True]
: release-arm64Code coverage* (full report)
functions
:30.6% (7982 of 26064 functions)
lines
:48.6% (63390 of 130514 lines)
* collected from Rust tests only
da1daa2 at 2024-11-27T21:28:33.661Z :recycle: