-
Notifications
You must be signed in to change notification settings - Fork 482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Epic: getpage@lsn benchmark #5771
Labels
Comments
problame
added
c/storage/pageserver
Component: storage: pageserver
t/Epic
Issue type: Epic
labels
Nov 2, 2023
This was referenced Nov 8, 2023
problame
added a commit
that referenced
this issue
Nov 22, 2023
(part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.
problame
added a commit
that referenced
this issue
Nov 22, 2023
(part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.
problame
added a commit
that referenced
this issue
Nov 22, 2023
(part of the getpage benchmarking epic #5771) The plan is to make the benchmarking tool log on stderr and emit results as JSON on stdout. That way, the test suite can simply take captures stdout and json.loads() it, while interactive users of the benchmarking tool have a reasonable experience as well. Existing logging users continue to print to stdout, so, this change should be a no-op functionally and performance-wise.
This was referenced Nov 27, 2023
problame
added a commit
that referenced
this issue
Nov 30, 2023
Part of getpage@lsn benchmark epic: #5771
problame
added a commit
that referenced
this issue
Dec 13, 2023
Part of getpage@lsn benchmark epic: #5771
This was referenced Dec 13, 2023
problame
added a commit
that referenced
this issue
Dec 13, 2023
Part of getpage@lsn benchmark epic: #5771
problame
added a commit
that referenced
this issue
Dec 14, 2023
Part of getpage@lsn benchmark epic: #5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. It also switches to the async version of reqwest, which I think is generally the right direction, and I need an async client API in the benchmark epic.
problame
added a commit
that referenced
this issue
Dec 14, 2023
Part of getpage@lsn benchmark epic: #5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. It also switches to the async version of reqwest, which I think is generally the right direction, and I need an async client API in the benchmark epic.
problame
added a commit
that referenced
this issue
Dec 14, 2023
Part of getpage@lsn benchmark epic: #5771
problame
added a commit
that referenced
this issue
Dec 15, 2023
Part of getpage@lsn benchmark epic: #5771 This PR moves the control plane's spread-all-over-the-place client for the pageserver management API into a separate module within the pageserver crate. I need that client to be async in my benchmarking work, so, this PR switches to the async version of `reqwest`. That is also the right direction generally IMO. The switch to async in turn mandated converting most of the `control_plane/` code to async. Note that some of the client methods should be taking `TenantShardId` instead of `TenantId`, but, none of the callers seem to be sharding-aware. Leaving that for another time: #6154
problame
added a commit
that referenced
this issue
Dec 16, 2023
Part of getpage@lsn benchmark epic: #5771
problame
added a commit
that referenced
this issue
Dec 18, 2023
problame
added a commit
that referenced
this issue
Dec 19, 2023
Part of getpage@lsn benchmark epic: #5771 This allows getting the list of tenants and timelines without triggering initial logical size calculation by requesting the timeline details API response, which would skew our results.
problame
added a commit
that referenced
this issue
Dec 21, 2023
This PR adds a component-level benchmarking utility for pageserver. Its name is `pagebench`. The problem solved by `pagebench` is that we want to put Pageserver under high load. This isn't easily achieved with `pgbench` because it needs to go through a compute, which has signficant performance overhead compared to accessing Pageserver directly. Further, compute has its own performance optimizations (most importantly: caches). Instead of designing a compute-facing workload that defeats those internal optimizations, `pagebench` simply bypasses them by accessing pageserver directly. Supported benchmarks: * getpage@latest_lsn * basebackup * triggering logical size calculation This code has no automated users yet. A performance regression test for getpage@latest_lsn will be added in a later PR. part of #5771
problame
added a commit
that referenced
this issue
Jan 8, 2024
Part of #5771 Extracted from #6214 This PR makes the test suite sensitive to the new env var `NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`. If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs to duplicate the the snapshot repo dir contents. Since mounting requires root privileges, we use sudo to perform the mounts. That, and macOS support, is also why copytree remains the default. If we ever run on a filesystem with copy reflink support, we should consider that as an alternative. This PR can be tried on a Linux machine on the `test_backward_compatiblity` test. I took the opportunity to create a session-scoped fixture for the compatibility snapshot directory, as a hint to where I hope the remainder of #6214 will evolve.
problame
added a commit
that referenced
this issue
Jan 9, 2024
Part of #5771 Extracted from #6214 This PR makes the test suite sensitive to the new env var `NEON_ENV_BUILDER_FROM_REPO_DIR_USE_OVERLAYFS`. If it is set, `NeonEnvBuilder.from_repo_dir` uses overlayfs to duplicate the the snapshot repo dir contents. Since mounting requires root privileges, we use sudo to perform the mounts. That, and macOS support, is also why copytree remains the default. If we ever run on a filesystem with copy reflink support, we should consider that as an alternative. This PR can be tried on a Linux machine on the `test_backward_compatiblity` test, which uses `from_repo_dir`.
problame
added a commit
that referenced
this issue
Jan 12, 2024
Status from Alexander:
|
@bayandin his one has been stuck as In Progress for a while? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
https://www.notion.so/neondatabase/Test-Tools-633ddee0cf1d4e6b9962ff5df433a27d?pvs=4
DoD
There is a benchmark for getpage@lsn performance that does not require a compute to run for benchmark execution (ok for benchmark setup).
The benchmark is run as part of the performance regression tests
The benchmark results are reproducible for Pageserver developers.
Pageserver developers get alerted about perf regressions.
Tasks
High Level
Impl
pub
rust-postgres#25pagebench
) #6174Error
tag #6298--force
mode that allows an empty dir #6328pageserver
crate #6299--runtime
#6351CopyFail
in PS logs #6392test_pageserver_max_throughput_getpage_at_latest_lsn[100-13-30]
#6473Follow-Ups
The text was updated successfully, but these errors were encountered: