Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: sharded pageserver ingest #9329

Open
5 of 13 tasks
VladLazar opened this issue Oct 9, 2024 · 2 comments
Open
5 of 13 tasks

Epic: sharded pageserver ingest #9329

VladLazar opened this issue Oct 9, 2024 · 2 comments
Assignees
Labels
c/storage Component: storage t/Epic Issue type: Epic

Comments

@VladLazar
Copy link
Contributor

VladLazar commented Oct 9, 2024

Background

We currently we send all WAL to all pageserver shards, and each shard filters out the data that it needs,
in this epic we add a mechanism to filter the WAL on the safekeeper, so that each shard receives
only the data it needs.

RFC in #8754.

Sub-issues

Preview Give feedback
  1. a/observability c/storage/safekeeper
    erikgrinaker
  2. a/scalability c/storage/pageserver
    VladLazar
  3. a/scalability c/storage/safekeeper
    VladLazar
  4. a/scalability c/storage/safekeeper
    VladLazar
  5. a/scalability c/storage/safekeeper
  6. a/benchmark c/storage/safekeeper
    erikgrinaker
  7. a/performance a/scalability c/storage/safekeeper
  8. a/benchmark c/storage
    erikgrinaker

Nice to Haves

Preview Give feedback
  1. c/storage/pageserver c/storage/safekeeper t/Epic
  2. 1 of 3
    c/storage c/storage/pageserver t/bug triaged

Review follow-ups

Preview Give feedback
  1. a/tech_debt c/storage c/storage/pageserver
    VladLazar
@VladLazar VladLazar added c/storage Component: storage t/Epic Issue type: Epic labels Oct 9, 2024
@jcsp
Copy link
Collaborator

jcsp commented Oct 10, 2024

Can we reconcile this with #6345? (I don't mind replacing that issue, just want to avoid duplication)

VladLazar added a commit that referenced this issue Oct 24, 2024
…#9472)

## Problem

WAL ingest couples decoding of special records with their handling
(updates to the storage engine mostly).
This is a roadblock for our plan to move WAL filtering (and implicitly
decoding) to safekeepers since they cannot
do writes to the storage engine. 

## Summary of changes

This PR decouples the decoding of the special WAL records from their
application. The changes are done in place
and I've done my best to refrain from refactorings and attempted to
preserve the original code as much as possible.

Related: #9335
Epic: #9329
VladLazar added a commit that referenced this issue Oct 29, 2024
## Problem

We wish to have high level WAL decoding logic in `wal_decoder::decoder`
module.

## Summary of Changes

For this we need the `Value` and `NeonWalRecord` types accessible there, so:
1. Move `Value` and `NeonWalRecord` to `pageserver::value` and
`pageserver::record` respectively.
2. Get rid of `pageserver::repository` (follow up from (1))
3. Move PG specific WAL record types to `postgres_ffi::walrecord`. In
theory they could live in `wal_decoder`, but it would create a circular
dependency between `wal_decoder` and `postgres_ffi`. Long term it makes
sense for those types to be PG version specific, so that will work out nicely.
4. Move higher level WAL record types (to be ingested by pageserver)
into `wal_decoder::models`

Related: #9335
Epic: #9329
VladLazar added a commit that referenced this issue Oct 31, 2024
…9524)

## Problem

Decoding and ingestion are still coupled in `pageserver::WalIngest`.

## Summary of changes

A new type is added to `wal_decoder::models`, InterpretedWalRecord. This
type contains everything that the pageserver requires in order to ingest
a WAL record. The highlights are the `metadata_record` which is an
optional special record type to be handled and `blocks` which stores
key, value pairs to be persisted to storage.

This type is produced by
`wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG
wal record.

The rest of this commit separates decoding and interpretation of the PG
WAL record from its application in `WalIngest::ingest_record`.

Related: #9335
Epic: #9329
@erikgrinaker erikgrinaker removed their assignment Nov 8, 2024
github-merge-queue bot pushed a commit that referenced this issue Nov 25, 2024
…#9746)

## Problem

For any given tenant shard, pageservers receive all of the tenant's WAL
from the safekeeper.
This soft-blocks us from using larger shard counts due to bandwidth
concerns and CPU overhead of filtering
out the records.

## Summary of changes

This PR lifts the decoding and interpretation of WAL from the pageserver
into the safekeeper.

A customised PG replication protocol is used where instead of sending
raw WAL, the safekeeper sends
filtered, interpreted records. The receiver drives the protocol
selection, so, on the pageserver side, usage
of the new protocol is gated by a new pageserver config:
`wal_receiver_protocol`.

 More granularly the changes are:
1. Optionally inject the protocol and shard identity into the arguments
used for starting replication
2. On the safekeeper side, implement a new wal sending primitive which
decodes and interprets records
 before sending them over
3. On the pageserver side, implement the ingestion of this new
replication message type. It's very similar
 to what we already have for raw wal (minus decoding and interpreting).
 
 ## Notes
 
* This PR currently uses my [branch of
rust-postgres](https://github.com/neondatabase/rust-postgres/tree/vlad/interpreted-wal-record-replication-support)
which includes the deserialization logic for the new replication message
type. PR for that is open
[here](neondatabase/rust-postgres#32).
* This PR contains changes for both pageservers and safekeepers. It's
safe to merge because the new protocol is disabled by default on the
pageserver side. We can gradually start enabling it in subsequent
releases.
* CI tests are running on #9747
 
 ## Links
 
 Related: #9336
 Epic: #9329
github-merge-queue bot pushed a commit that referenced this issue Nov 27, 2024
…#9821)

## Problem

#9746 lifted decoding and
interpretation of WAL to the safekeeper.
This reduced the ingested amount on the pageservers by around 10x for a
tenant with 8 shards, but doubled
the ingested amount for single sharded tenants.

Also, #9746 uses bincode which
doesn't support schema evolution.
Technically the schema can be evolved, but it's very cumbersome.

## Summary of changes

This patch set addresses both problems by adding protobuf support for
the interpreted wal records and adding compression support. Compressed
protobuf reduced the ingested amount by 100x on the 32 shards
`test_sharded_ingest` case (compared to non-interpreted proto). For the
1 shard case the reduction is 5x.

Sister change to `rust-postgres` is
[here](neondatabase/rust-postgres#33).

## Links

Related: #9336
Epic: #9329
github-merge-queue bot pushed a commit that referenced this issue Nov 27, 2024
## Problem

Can't change protocol at tenant granularity.

## Summary of changes

Add tenant config level override for wal receiver protocol.

## Links

Related: #9336
Epic: #9329
@VladLazar
Copy link
Contributor Author

VladLazar commented Dec 20, 2024

Short pre winter holiday update.

Once we're back:

  • Continue deployment of sharded ingest to all other regions
  • Tests and tweaks to the fan-out wal reader (e.g. use futures when not sharded to minimise task spawning)
  • Separate roll-out of fan-out wal reader

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage Component: storage t/Epic Issue type: Epic
Projects
None yet
Development

No branches or pull requests

3 participants