Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peers Not Stable in 24.12.1 #8963

Closed
StefanBratanov opened this issue Jan 1, 2025 · 3 comments · Fixed by #8969
Closed

Peers Not Stable in 24.12.1 #8963

StefanBratanov opened this issue Jan 1, 2025 · 3 comments · Fixed by #8969
Assignees

Comments

@StefanBratanov
Copy link
Contributor

StefanBratanov commented Jan 1, 2025

Users reported that Teku struggles to get enough peers since 24.12.1

  1. A user setting the p2p-peer-upper-bound to 256, used to hit 256 but since 24.12.1 they're getting around 100 - 150:
    https://discord.com/channels/697535391594446898/697539289042649190/1323510723921580116
  2. Another user experienced the same issue. They're setting the p2p-peer-upper-bound to 256, used to hit 256 but since 24.12.1 they're getting around 85 - 125:
    https://discord.com/channels/697535391594446898/697539289042649190/1323903016428113982
@mehdi-aouadi mehdi-aouadi self-assigned this Jan 6, 2025
@mehdi-aouadi
Copy link
Contributor

mehdi-aouadi commented Jan 6, 2025

Some investigation results:

  • The described issue didn't happen in any of our nodes (holesky/mainnet)
  • Our mainnet nodes are configured with 120 upper bound peer limit and they hit that target:
Screenshot 2025-01-06 at 18 27 19
  • Our Holesky nodes are configured with 160 upper bound peer limit and hit that target too
Screenshot 2025-01-06 at 18 28 13
  • Some Holesky boot nodes are configured with 300 upper bound peer limit and they also manage to hit that target
    The Holesky boot node are running an older version: 24.10.3
Screenshot 2025-01-06 at 18 29 21

No conclusions can be made based on our nodes logs for the moment

@mehdi-aouadi
Copy link
Contributor

mehdi-aouadi commented Jan 7, 2025

Some new observations:

  • When setting the p2p-peer-upper-bound to 250 we're unable to reach that target with the 24.12.1 (tested on nightly-01), stuck at around 180 peers
  • The same issue was consistent after multiple node restart
  • There are many TOO_MANY_PEERS disconnect reasons
  • Rolling back to 24.12.0 solves the issue and we're able to reach even 300 peers when setting p2p-peer-upper-bound to 300
  • This issue is observed only when the p2p-peer-upper-bound is set to a value > 180

@tbenr
Copy link
Contributor

tbenr commented Jan 8, 2025

so seems like #8912 is the key:

upperbound set to 300 peers

with it, when we approach ~180 peers we see a lot of drops due to PING timeout

image
image

without it:

image
image

so applying MAX_CONCURRENT_REQUESTS to ping, with reaching ~180 cause us to start queuing more than 2 outbound pings for each peers and it is interpreted as remote peer not responding while is our throttling kicking in

  void sendPeriodicPing(final Eth2Peer peer) {
    if (peer.getUnansweredPingCount() >= eth2RpcOutstandingPingThreshold) {
      LOG.debug("Disconnecting the peer {} due to PING timeout.", peer.getId());
      peer.disconnectCleanly(DisconnectReason.UNRESPONSIVE).ifExceptionGetsHereRaiseABug();
    } else {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants