Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print a log message if GetPage response takes too long #10046

Merged
merged 3 commits into from
Dec 10, 2024

Conversation

hlinnaka
Copy link
Contributor

@hlinnaka hlinnaka commented Dec 6, 2024

We have metrics for GetPage request latencies, but this is an extra measure to capture requests that take way too long in the logs. The log message is printed every 10 s, until the response is received:

PG:2024-12-09 16:02:07.715 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:17.723 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 20.008 s, still waiting (sent 10613 requests, received 10612 responses)
PG:2024-12-09 16:02:19.719 GMT [1782845] LOG:  [NEON_SMGR] [shard 0] received response from pageserver after 22.006 s

@hlinnaka hlinnaka requested review from problame and MMeent December 6, 2024 21:45
@hlinnaka hlinnaka requested review from a team as code owners December 6, 2024 21:45
@hlinnaka
Copy link
Contributor Author

hlinnaka commented Dec 6, 2024

We also talked about disconnecting and retrying with a new connection, if a request takes too long. I didn't dare to do that yet, but just logging long requests is useful already.

I set the threshold at 10 s, but I'm open to opinions on that. (A cop out answer would be to make it configurable, but I think that'd be overkill, and would still leave us with the question of what to actually set it to)

Copy link

github-actions bot commented Dec 6, 2024

7077 tests run: 6752 passed, 0 failed, 325 skipped (full report)


Flaky tests (5)

Postgres 17

Postgres 16

Postgres 15

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (8333 of 26528 functions)
  • lines: 47.7% (65568 of 137367 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
6af4747 at 2024-12-10T13:44:48.269Z :recycle:

We have metrics for GetPage request latencies, but this is an extra
measure to capture requests that take way too long in the logs. The
log message is printed every 10 s, until the response is received:

```
PG:2024-12-06 21:33:14.713 GMT [1507715] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 10.000 s, still waiting
PG:2024-12-06 21:33:24.721 GMT [1507715] LOG:  [NEON_SMGR] [shard 0] no response received from pageserver for 20.009 s, still waiting
PG:2024-12-06 21:33:29.719 GMT [1507715] LOG:  [NEON_SMGR] [shard 0] received response from pageserver after 25.006 s
```
@hlinnaka hlinnaka force-pushed the log-long-getpage-responses branch from 896fd29 to ec07d0e Compare December 9, 2024 16:00
@hlinnaka hlinnaka enabled auto-merge December 9, 2024 16:30
pgxn/neon/libpagestore.c Outdated Show resolved Hide resolved
@hlinnaka hlinnaka added this pull request to the merge queue Dec 9, 2024
@hlinnaka hlinnaka removed this pull request from the merge queue due to a manual request Dec 9, 2024
@hlinnaka hlinnaka enabled auto-merge December 9, 2024 18:55
@hlinnaka hlinnaka added this pull request to the merge queue Dec 10, 2024
Merged via the queue into main with commit b853f78 Dec 10, 2024
84 checks passed
@hlinnaka hlinnaka deleted the log-long-getpage-responses branch December 10, 2024 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants