Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_runner/performance: add sharded ingest benchmark #9591

Merged
merged 6 commits into from
Nov 2, 2024

Conversation

erikgrinaker
Copy link
Contributor

Problem

We need a benchmark to quantify the improvement of sharded ingestion.

Resolves #9569.

Summary of changes

Adds a Python benchmark for sharded ingestion. This ingests 7 GB of WAL (100M rows) into a Safekeeper and fans out to 10 shards running on 10 different pageservers. The ingest volume and duration is recorded.

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Oct 31, 2024

@VladLazar @jcsp Initial local results without sharded ingest:

MacBook Pro M3, 11 cores, 7 GB WAL (100M rows)

1 shard: 41.382s
2 shards: 40.805s
4 shards: 43.169s
10 shards: 61.436s
32 shards: 148.067s

Ingestion is IO bound with 1 shard (i.e. Safekeeper is bottleneck), and pageserver ingestion is single-threaded so can use at most 1 CPU core. As long as pageservers have CPU headroom, the additional work due to sharding does not affect throughput and remains IO bound (i.e. Safekeeper-bound). Only when we saturate the CPU at ~10 shards does throughput drop, and then drops ~linearly up to 32 shards as expected.

This implies that we won't see ingestion throughput improvements unless pageservers see CPU exhaustion. We will see a reduction in CPU usage though (and thus potential cost savings). However, we may see throughput increases if we saturate the bandwidth capacity of the Safekeepers too (bandwidth use currently scales linearly with shards). Just to set expectations for throughput improvements @jcsp.

@erikgrinaker
Copy link
Contributor Author

erikgrinaker commented Oct 31, 2024

Btw, writing this as a Rust in-memory benchmark wasn't practical due to the tight code coupling, so I opted for a Python benchmark. A Pageserver WAL receiver pulls in most Pageserver infrastructure via the timeline/tenant, and all of this had to be made public for construction from benchmark binaries. We should add appropriate trait abstraction boundaries to allow constructing components in isolation for tests/benchmarks, but that's more work that it's worth right now.

Copy link

github-actions bot commented Oct 31, 2024

5328 tests run: 5106 passed, 0 failed, 222 skipped (full report)


Flaky tests (1)

Postgres 17

Code coverage* (full report)

  • functions: 31.5% (7771 of 24690 functions)
  • lines: 48.9% (61030 of 124696 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
7820750 at 2024-11-02T16:35:33.227Z :recycle:

erikgrinaker added a commit that referenced this pull request Nov 1, 2024
## Problem

`tenant_get_shards()` does not work with a sharded tenant on 1
pageserver, as it assumes an unsharded tenant in this case. This special
case appears to have been added to handle e.g. `test_emergency_mode`,
where the storage controller is stopped. This breaks e.g. the sharded
ingest benchmark in #9591 when run with a single shard.

## Summary of changes

Correctly look up shards even with a single pageserver, but add a
special case that assumes an unsharded tenant if the storage controller
is stopped and the caller provides an explicit pageserver, in order to
accomodate `test_emergency_mode`.
test_runner/performance/test_sharded_ingest.py Outdated Show resolved Hide resolved
test_runner/performance/test_sharded_ingest.py Outdated Show resolved Hide resolved
@erikgrinaker erikgrinaker enabled auto-merge (squash) November 2, 2024 15:43
@erikgrinaker erikgrinaker merged commit 0058eb0 into main Nov 2, 2024
80 checks passed
@erikgrinaker erikgrinaker deleted the erik/sharded-ingest-benchmark branch November 2, 2024 16:42
@erikgrinaker
Copy link
Contributor Author

Ingestion is IO bound with 1 shard (i.e. Safekeeper is bottleneck), and pageserver ingestion is single-threaded so can use at most 1 CPU core. As long as pageservers have CPU headroom, the additional work due to sharding does not affect throughput and remains IO bound (i.e. Safekeeper-bound).

This isn't entirely accurate. With 1 shard, the Safekeeper runs at 70% CPU while the Pageserver runs at 170% CPU -- no significant IO-wait. I think this indicates that we may get a 2x speedup with 2 shards, but it'll flatten out there (unless CPU is saturated). We'll find out once sharded ingest is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

safekeeper: add WAL ingestion benchmarks with pageserver fan-out
2 participants