Network improvements #1036

holgerd77 · 2021-01-08T11:58:47Z

No description provided.

codecov · 2021-01-08T12:00:29Z

Codecov Report

Merging #1036 (c110b70) into master (5ac5bc7) will increase coverage by 0.19%.
The diff coverage is 87.09%.

Flag	Coverage Δ
block	`77.65% <ø> (-0.28%)`	⬇️
blockchain	`77.92% <ø> (ø)`
client	`88.25% <100.00%> (+0.29%)`	⬆️
common	`91.87% <ø> (-0.25%)`	⬇️
devp2p	`82.73% <85.71%> (+0.39%)`	⬆️
ethash	`82.08% <ø> (ø)`
tx	`90.15% <ø> (+0.15%)`	⬆️
vm	`83.09% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

… RLPx peer connections to improve networking distribution and connection reliability

…on execution log msg output

…banning old peers and replacing with a new peer on KBucket ping

…dditions of new neighbour peers

holgerd77 · 2021-01-14T11:13:09Z

Ah, forgot a bit about this PR, this is actually ready for review already for some days. 😄

As mentioned yesterday in the call, the initial goal here was to improve network connection quality on a level that there is a reliable connection to non-mainnet networks possible, before it was rather not possible at all any more to connect to e.g. Rinkey or Goerli. I didn't quite get there but the networking quality is generally improved with the PR for all networks and also the non-mainnet networks got better and now it is - sometimes - possible to connect after 1-2 minutes. As mentioned in the call yesterday a way to improve from some other angle would be here to add DNS discovery support, will open a dedicated issue on that.

This PR does the following:

Some additional RLPx type improvements along 75d224b
Analogue to the latest discovery changes from Client Network Improvements #1031 the refill() calls for RLPx are subdivided to better distribute networking traffic, this perceptibly improves the connection 838d6c4
There was a bug in the client reporting a repeated first block number on executions (in case you already wondered, this was just a logging error), this has been fixed in 88e5f03
"devp2p -> connection reliability: fixed an error in DPT not properly banning old peers and replacing with a new peer on KBucket ping" (from the commit message): this is a super important bug fix, before the ban functionality for discovery for peers just didn't work due to a generic catch block shading the error messages. This lead to bad peers just being used for connections over and over again. Bad. There are some more of these generic catch blocks (so not containing any code) in the code base. It's a bit less trivial to fix though than one would think on first thought since this needs some thought on well-balancing how to instead handling potential errors, since there is some mix of errors which are just expected, some which one might want to know about and some which are severe. And we have to decide how to react properly (re-throw, emit an error event, display a debug message, do nothing) without polluting the client log e.g.. 06f31ea
There was another traffic cluster caused along neighbour retrieval from the DPT server in _onServerPeers, this has been smoothened in c110b70

cgewecke

This looks good to me!

(Just have a question about the how the refill interval changes work.)

cgewecke · 2021-01-15T06:24:48Z

packages/devp2p/src/rlpx/rlpx.ts

+      } else {
+        // Still keep peer in queue
+        return true
+      }


Out of curiosity, do you know why this technique improves connection reliability?

With this change the connection attempts are spaced out and for any given peer there's a 1 in 10 chance it will be made?

This distributes the networking traffic over time, so there's not all connections hammered out at once - then followed by a longer non-networking-activity - and subsequently many of the replies from the peers also coming in at a similar PIT (in the old version). Instead the traffic gets out and in more frequently and distributed. I am not 100% sure about the reliability impacts in a narrower sense, so e.g. if this directly prevents connection losses by avoiding concurrent network handling, my deeper networking experience is too limited for that. But this definitely leads to a healthier protocol flow - this long waiting period is avoided - becoming e.g. very apparent when the last peer connection gets lost and there is another 30 second wait - but there is always some activity/connection reattempts going on. I would also think that these distributes better on available protocol respectively implementation-given resource restrictions like the maximum number of allowed peer connections or the number of slots for the queue, but not completely sure here.

However benefit reasons are distributed: this is one of the changes which can be really "felt" when running the client, so the first connection pickup after start is just directly and apparently getting so much faster that it gets a bit out of question if there "might be" some positive effect. 😄

holgerd77 added the PR state: WIP label Jan 8, 2021

holgerd77 force-pushed the network-improvements branch 2 times, most recently from 3398978 to 4afa40f Compare January 8, 2021 18:44

holgerd77 added PR state: needs review and removed PR state: WIP labels Jan 13, 2021

holgerd77 added 5 commits January 14, 2021 11:22

devp2p -> connection reliability: RLPx type improvements

75d224b

devp2p -> connection reliability: subdivided interval calls to refill…

838d6c4

… RLPx peer connections to improve networking distribution and connection reliability

client -> vm execution: fixed firstBlock number and hash association …

88e5f03

…on execution log msg output

devp2p -> connection reliability: fixed an error in DPT not properly …

06f31ea

…banning old peers and replacing with a new peer on KBucket ping

devp2p -> connection reliability: distribute network traffic on DPT a…

c110b70

…dditions of new neighbour peers

holgerd77 force-pushed the network-improvements branch from e1028a7 to c110b70 Compare January 14, 2021 10:22

holgerd77 requested review from ryanio, cgewecke and jochem-brouwer January 14, 2021 10:22

cgewecke approved these changes Jan 15, 2021

View reviewed changes

holgerd77 merged commit 816e0bc into master Jan 15, 2021

holgerd77 deleted the network-improvements branch January 15, 2021 08:08

holgerd77 mentioned this pull request Jan 15, 2021

Client v0.1 Release Planning (PoA / Yolo v3) #1045

Closed

9 tasks

holgerd77 added package: client package: devp2p labels Jan 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network improvements #1036

Network improvements #1036

holgerd77 commented Jan 8, 2021

codecov bot commented Jan 8, 2021 •

edited

Loading

holgerd77 commented Jan 14, 2021

cgewecke left a comment

cgewecke Jan 15, 2021

holgerd77 Jan 15, 2021

Network improvements #1036

Network improvements #1036

Conversation

holgerd77 commented Jan 8, 2021

codecov bot commented Jan 8, 2021 • edited Loading

Codecov Report

holgerd77 commented Jan 14, 2021

cgewecke left a comment

Choose a reason for hiding this comment

cgewecke Jan 15, 2021

Choose a reason for hiding this comment

holgerd77 Jan 15, 2021

Choose a reason for hiding this comment

codecov bot commented Jan 8, 2021 •

edited

Loading