-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement EIP-1459: DNS peer discovery #1070
Conversation
Codecov Report
Flags with carried forward coverage won't be shown. Click here to find out more. |
bc55869
to
30254b6
Compare
[EDIT - Resolved in 9bbc676 by adding an option to DNS class which let you specify the DNS server's ip address] Triage note: wrote an e2e test for this which resolves peers from Querying dns servers locally shows:
...in CI
This stack overflow discussion about DNS resolution failures suggests that localhost DNS servers in a docker container don't work and fixing this would require a special container config. |
e532d23
to
39ce425
Compare
Another small update here. Have had a lot of trouble getting this actually improve peer discovery for the testnets but am making headway. Several things going on:
|
Example of goerli sync failure (on block 1)
Edit: One component of this is that blockchain.update is unable to retrieve the totalDifficulty for header or block hashes starting w/ block 1 - this is the origin of the This Ropsten fails with
|
@cgewecke thanks for the update! Yes, if my understanding here is correct, the non-awareness of a chain ID is the biggest flaw along discovery v4 and its message types like the Will also circle in Felix @fjl from the geth team, maybe he finds some time to give this a short read as well. @fjl: Actually I have a side question here: if discovery v5 is such a heavy protocol change and this is still not ready for mainnet (or at least not activated yet) - as we discussed lately in the geth chat - have you guys every considered to do a very low-level v4.1 discovery version and e.g. just add an additional byte with the chain ID to the various calls ( @cgewecke the VM execution on the non-mainnet networks is still completely untested and from the perspective of this PR "the problem is solved" once you reach the point that the client DOES get some blocks imported and fails on execution, so feel very free to stop here for now. You can of course also give this some attention (maybe in a different PR though). |
76029b0
to
e84e91c
Compare
Whew, has this gotten extensive. 😀 Would have never expected that this would touch 30 files. 😛 |
Code changes look really great though, looking forward to give this a try at some point and get a bit of a better feeling on how this behaves. |
@@ -43,7 +43,7 @@ export class KBucket extends EventEmitter { | |||
|
|||
const keys = [] | |||
if (Buffer.isBuffer(obj.id)) keys.push(obj.id.toString('hex')) | |||
//if (obj.address && obj.port) keys.push(`${obj.address}:${obj.port}`) | |||
if (obj.address && obj.tcpPort) keys.push(`${obj.address}:${obj.tcpPort}`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this was commented out (maybe perceived redundancy?).
Peer descriptions obtained via DNS do not have an id
(unlike peers from findNeighbours
) so without synthesizing this ip based key they aren't being banned consistently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I commented this out since there was just no port
parameter on the object and I wasn't sure if to add the tcp or udp port instead.
@holgerd77 Oh great, yes I was going to ask if you'd run it at some point to see if performance seems comparable to what's normal. It's hard to tell if it's slower (or faster?) Will take this out of WIP now anyway... |
@cgewecke yeah, sure, running it on what network? Dedicatedly the PoA networks or mainnet included? |
Did a super-short test, for now I wasn't able to get a connection to both of the PoA networks. Did only one respective run though and waited for ~2 min. How long did this take for you? |
@holgerd77 Goerli is usually less than minute, Rinkeby sometimes more. If you run with
...you should see downloading begin but processing the block fails. Without the debug flag on it can seem like nothing's happening. I just ran it and almost instantly connected, getting this kind of output
|
I checked the code a bit, it looks mostly OK. I like that you didn't implement the tree building and focused only on client. Regarding the unit test failing because of DNS settings: you should probably include a mock resolver and not query the real DNS in the test. |
We are working on enabling discv5 on mainnet, it will be enabled for LES server nodes in the next geth release. We have also enabled the discv4 ENR extension a long time ago, so if you want to pre-filter discv4 nodes using ENR, you can do that now. |
Resolved this in CI by adding a default DNS server ip (8.8.8.8 - Google). But perhaps there shouldn't be a default. The main tests for this mock DNS as suggested and the live network integration test is removable. It's just a sanity check that it works in another context (e.g Linux, container, etc).
👍 Will add this rn. |
@fjl thanks for taking the time and taking a short look, that is super helpful! 😊 Short follow-up question: the Discovery v5 will then be the same as the Eth 2.0 Discovery v5 specification, right? Because then we would have a look into the ChainSafe implementation and see if we can bridge this with our RLPx implementation. (@cgewecke this might be a good follow-up task? 😋 ) @cgewecke tested again and I am now able to get connection - after just slightly more time to wait - with both Goerli and Rinkeby. I will directly merge this in here. Felix already had a short look and I think this is one of these PRs where nothing is lost if there is still some evolving improvement happening over time since this doesn't touch critical core logic and just improves the status quo on networking. And atm the effort to review from someone from the outside is disproportionally high, I'll try to get familiar with the code base over time though and eventually give some post-merge feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (see previous comment)
For Goerli (and I think for Rinkey as well) interestingly enough sync starts with block number 2 (and not 1, so the first after genesis) which obviously fails? 🤔 Hmm, should be traceable though towards some root cause I guess with some following of the execution chain. |
#1043
Implements core logic needed to quickly acquire testnet (or mainnet) peers from the DNS ENR tree per EIP-1459.
Have followed EIP recommendations to
Basic idea is that instead of making
findNeighbour
requests in DPT, we query a set of DNS records organized as merkle tree of ip addresses & ports maintained by geth clients. These trees are url addressed by network so the number of unusable peer contacts made during peer discovery is greatly reduced.Locally have been able to acquire btw 5 and 7 mainnet peers within 3 or 4 minutes. Atm the client is unable to sync with any of the canonical testnets for reasons unrelated to peer discovery (see notes below). However, initial testing suggests that these can take a bit longer to find peers in b/c their networks are smaller.
To watch peer acquisition:
To watch DNS tree traversal
PR adds
getDnsPeers
method for DPT which is run during client bootstrap and during the dpt refresh cycleTODO:
Begin syncing a testnet correctly