Improve safekeeper to pageserver protocol #5543

petuhovskiy · 2023-10-12T19:02:23Z

Motivation

Currently we use a standard postgres way to download WAL – START_REPLICATION described in the postgres docs https://www.postgresql.org/docs/current/protocol-replication.html

The main problem with it is scalability, because each connection uses a separate TCP connection. TCP connections are not cheap to establish and maintain, and number of concurrent ps<->sk TCP connections is limited by the ports count.

The less improtant problem is protocol extensibility, which can also be improved, but may be ommitted in the first iteration since there is no bottleneck in it.

We already saw issues with network overloading, which manifested in DNS resolution failures. These issues were reproduced around pageserver/safekeeper restarts in a presence of 10k+ active timelines.

DoD

TCP connections between pageservers and safekeepers should be multiplexed, and there should be O(1) real network connections between specific pageserver and safekeeper.

Implementation ideas

https://neondb.slack.com/archives/C039YKBRZB4/p1683125758173469?thread_ts=1683097406.177109&cid=C039YKBRZB4

Use gRPC. It has HTTP/2 to multiplex many streams within a single TCP connection. It also provides convenient protocol extensibility.

I also think if we will have a gRPC connection between every safekeeper and pageserver, we can use it to bypass broker in delivering timeline updates from safekeepers to pageserver.

The text was updated successfully, but these errors were encountered:

problame · 2023-10-13T08:58:49Z

One advantage of one TCP connection per tenant is, though, that we get

observability through standard tools
fairness between tenant connections through TCP

I haven't kept up with how good gRPC is in either of those areas.

Let's perhaps turn this into a short RFC to discuss there?

petuhovskiy · 2023-10-13T09:38:57Z

RFC with implementation details make sense, I can create it later when/if I'll start working on this task. For now it should be ok to have a small conversation here.

observability through standard tools

What standard tools do you have in mind? netstat can show port and destination safekeeper address, and we can count number of active sockets, but it's hard associate specific tenant/timeline with a socket.

fairness between tenant connections through TCP

HTTP/2 streams are fair by default, and we also can specify weights if needed. I wouldn't expect any fairness differences from existing setup.

petuhovskiy added c/storage/safekeeper Component: storage: safekeeper t/Epic Issue type: Epic a/scalability Area: related to scalability a/performance Area: relates to performance of the system labels Oct 12, 2023

VladLazar mentioned this issue Oct 11, 2024

Epic: sharded pageserver ingest #9329

Open

VladLazar removed the t/Epic Issue type: Epic label Oct 11, 2024

VladLazar changed the title ~~Epic: Improve pageserver<-safekeepers WAL retrieval~~ Improve safekeeper to pageserver protocol Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve safekeeper to pageserver protocol #5543

Improve safekeeper to pageserver protocol #5543

petuhovskiy commented Oct 12, 2023 •

edited by VladLazar

Loading

problame commented Oct 13, 2023 •

edited

Loading

petuhovskiy commented Oct 13, 2023 •

edited

Loading

Improve safekeeper to pageserver protocol #5543

Improve safekeeper to pageserver protocol #5543

Comments

petuhovskiy commented Oct 12, 2023 • edited by VladLazar Loading

Motivation

DoD

Implementation ideas

problame commented Oct 13, 2023 • edited Loading

petuhovskiy commented Oct 13, 2023 • edited Loading

petuhovskiy commented Oct 12, 2023 •

edited by VladLazar

Loading

problame commented Oct 13, 2023 •

edited

Loading

petuhovskiy commented Oct 13, 2023 •

edited

Loading