Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configurable --upstream recursive resolver for non-HNS queries #62

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

pinheadmz
Copy link
Member

Closes #53

This PR allows the user to pass the address of an "upstream" recursive resolver to hnsd, which is used when a query fails to resolve from HNS root zone instead of the typical built-in "ICANN fallback" implemented in the root nameserver itself.

Example usages:

hnsd --upstream 1.1.1.1

hnsd -t 10.0.1.200:5350

The main changes to hnsd when --upstream is passed in:

  • When the root nameserver (ns) gets a proof of nonexistence from the HNS network, instead of looking up the name in the hard-coded ICANN root zone file and resolving there, the ns returns an error: REFUSED

  • On launch, the recursive resolver (rs) normally spawns an unbound context and configures the ns server as a stub server for the "." zone. This still happens - but IN ADDITION, a second unbound context is spawned (called fallback) configured with ub_ctx_set_fwd() meaning ALL queries are forwarded to an upstream recursive resolver.

  • All responses from the primary rs unbound instance are inspected after they are resolved. If the response from unbound is SERVFAIL, the request is passed to the fallback instance of unbound.

Observe in the logs: (using hnsd -t 1.1.1.1)

pool: sending proof request for: com.
chain (274): using safe height of 270 for resolution
peer 0 (127.0.0.1:46806): sending proof request for: com.
peer 0 (127.0.0.1:46806): received proof: f84185260b0e865f7bafa35aeeed8bb1ffdaaa91ce624092a9b20a27511a161e
peer 0 (127.0.0.1:46806): received proof for: com
ns: forwarding to upstream resolver: com
ns: resolve response error: EREFUSED
ns: sending refused (33813)
ns: query
ns:   id=39895
ns:   labels=2
ns:   name=google.com.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=com
ns:   addr=127.0.0.1:52369
...
rs: received answer for: google.com.
rs: redirecting lookup to fallback for: google.com.
rs_worker: request 1: google.com.
rs: received answer for: google.com.
rs:   rcode: 0
rs:   havedata: 1
rs:   nxdomain: 0
rs:   secure: 0
rs:   bogus: 0

Disclaimer

I think this PR is a bit smelly. Creating an entire second recursive resolver seems like a bit much, but I couldn't think of any other way to do this, because unbound requires all this configuration up front before opening (like setting a stub zone). So with unbound, either all requests get forwarded, or all requests get sent to the specified root zone server. I may try to experiment with other approaches using the same unbound instance but the configuration options are limiting.

It also means there is a second unbound worker and worker thread. This made closing the program cleanly tricky because of funny async stuff in the original design which never intended for there to be more than one unbound thread.

Finally, the "trick" isn't perfect either. I'm using the REFUSED error message from root ns to unbound to trigger the fallback resolution, but by the time unbound returns the response, its been glossed over with a simple SERVFAIL. This could mean that ANY error from ns being interpreted as SERVFAIL by unbound will lead to the fallback resolution. This can probably be switched to NXDOMAIN since the root ns isn't checking the ICANN zone at all in this mode.

@stephen304
Copy link

stephen304 commented Mar 18, 2021

This looks good, and I'm able to successfully forward standard dns to my router with sudo ./hnsd -p 4 -r 127.0.0.1:53 --upstream 192.168.1.1 and I know that's working since I'm able to resolve local dhcp hostnames through it, though I don't seem to be able to resolve hns domains when --upstream is in use. This part looks related: redirecting lookup to fallback for: www.welcome.nb.:

dig A @127.0.0.1 www.welcome.nb

; <<>> DiG 9.16.12 <<>> A @127.0.0.1 www.welcome.nb
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 3922
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;www.welcome.nb.			IN	A

;; AUTHORITY SECTION:
.			1797	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2021031703 1800 900 604800 86400

;; Query time: 416 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed Mar 17 23:34:12 EDT 2021
;; MSG SIZE  rcvd: 118

hnsd output:

rs: query
rs:   id=3922
rs:   labels=3
rs:   name=www.welcome.nb.
rs:   type=1
rs:   class=1
rs:   edns=1
rs:   dnssec=0
rs:   tld=nb
rs:   addr=127.0.0.1:58873
rs_worker: request 15: www.welcome.nb.
ns: query
ns:   id=44869
ns:   labels=1
ns:   name=nb.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:22606
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 1 (173.255.209.126:44806): sending proof request for: nb.
ns: query
ns:   id=14919
ns:   labels=1
ns:   name=nb.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:38525
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 1 (173.255.209.126:44806): already requesting proof for: nb.
ns: query
ns:   id=40762
ns:   labels=1
ns:   name=nb.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:60755
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 1 (173.255.209.126:44806): already requesting proof for: nb.
peer 1 (173.255.209.126:44806): received proof: b92ad996982b44fbea27d833c52e3fb0d6192d63835a13c61dfeb0126e2ee2ef
peer 1 (173.255.209.126:44806): received proof for: nb
ns: sending msg (40762)
ns: sending msg (14919)
ns: sending msg (44869)
ns: query
ns:   id=53376
ns:   labels=2
ns:   name=ns1.nb.
ns:   type=28
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:18811
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 1 (173.255.209.126:44806): sending proof request for: nb.
ns: query
ns:   id=34138
ns:   labels=2
ns:   name=ns1.nb.
ns:   type=28
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:13838
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 0 (139.162.183.168:44806): sending proof request for: nb.
ns: query
ns:   id=49908
ns:   labels=2
ns:   name=ns1.nb.
ns:   type=28
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:5698
pool: sending proof request for: nb.
chain (59467): using safe height of 59467 for resolution
peer 1 (173.255.209.126:44806): already requesting proof for: nb.
peer 1 (173.255.209.126:44806): received proof: b92ad996982b44fbea27d833c52e3fb0d6192d63835a13c61dfeb0126e2ee2ef
peer 1 (173.255.209.126:44806): received proof for: nb
ns: sending msg (49908)
ns: sending msg (53376)
ns: query
ns:   id=2273
ns:   labels=1
ns:   name=nb.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:26623
cache: cache hit for: nb.
ns: sending cached msg (2273): 174
ns: query
ns:   id=31046
ns:   labels=1
ns:   name=nb.
ns:   type=1
ns:   class=1
ns:   edns=1
ns:   dnssec=1
ns:   tld=nb
ns:   addr=127.0.0.1:18842
cache: cache hit for: nb.
ns: sending cached msg (31046): 174
rs: received answer for: www.welcome.nb.
rs: redirecting lookup to fallback for: www.welcome.nb.
rs_worker: request 15: www.welcome.nb.
rs: received answer for: www.welcome.nb.
rs:   rcode: 3
rs:   havedata: 0
rs:   nxdomain: 1
rs:   secure: 0
rs:   bogus: 0
peer 0 (139.162.183.168:44806): received proof: b92ad996982b44fbea27d833c52e3fb0d6192d63835a13c61dfeb0126e2ee2ef
peer 0 (139.162.183.168:44806): received proof for: nb
ns: sending msg (34138)
peer 0 (139.162.183.168:44806): unknown command: 6
peer 1 (173.255.209.126:44806): unknown command: 6

Edit: This may be unrelated to this patch, and I think this is the same problem from before that I never figured out. I can reproduce it on home internet as well as mobile tethering (though the output looks different) so maybe I should open a separate issue for this.

;; QUESTION SECTION:
;www.welcome.nb.			IN	A

;; AUTHORITY SECTION:
nb.			3600	IN	SOA	a.misconfigured.powerdns.server. hostmaster.nb. 2021030901 10800 3600 604800 3600
cache: cache hit for: nb.
ns: sending cached msg (61351): 174
rs: received answer for: www.welcome.nb.
rs:   rcode: 3
rs:   havedata: 0
rs:   nxdomain: 1
rs:   secure: 0
rs:   bogus: 1
rs:   why_bogus: validation failure <www.welcome.nb. A IN>: no DNSSEC records from 44.231.6.183 for DS nb. while building chain of trust
peer 2 (74.207.247.120:44806): unknown command: 6

@pinheadmz
Copy link
Member Author

I was able to resolve welcome.nb with this patch (without the www. which doesn't exist) but it highlights a big problem in this design, names are getting redirected wrong. In particular I'm noticing that names that delegate to ICANN names via NS are failing.

For example, the HNS name 3b has NS ns1.buffrr.dev which is an ICANN domain. For some reason when unbound tries to lookup .dev in the recursive process, the name that gets forwarded to the upstream server is 3b (not dev) -- so this will take some fine tuning.

HNS names that resolve directly without delegating to ICANN are working on this branch though, you can try proofofconcept for one.

@stephen304
Copy link

Great welcome.nb works when tethered to my phone so on normal networks it works as intended.

I think I figured out definitively that my router's port 53 outbound redirect is messing up hnsd, anytime that rule is active on my router, hns resolution seems to be broken. I'll put the details in a new issue since I have it reproducible now.

@pinheadmz
Copy link
Member Author

does your router allow any recursive dns traffic? Because even if the root resolution is done internally, the recursive resolver will still need to make requests to port 53 on all the authoritative nameservers for each zone it needs to query

@stephen304
Copy link

stephen304 commented Mar 18, 2021

Ah I didn't realize that, the rule that everyone typically uses for redirecting chromecast to pihole involves redirecting all outbound udp traffic on port 53 to the pihole. Wouldn't this mean that hns resolution itself would not work on a hostile network that tampers with traffic on port 53? Is there any way to make the recursive resolver use the upstream setting for authoritative queries? I'm not sure how that works so maybe that doesn't make sense.

image

@pinheadmz
Copy link
Member Author

Wouldn't this mean that hns resolution itself would not work on a hostile network that tampers with traffic on port 53?

Correct, in my hotel last week I experienced this and had to use a VPN to get any kind of local-recursive DNS resolution to work. I still think this is a useful option to add to hnsd since the hard-coded root zone may be out of date. And I think pi-hole should still work in this configuration, but I'm not entirely sure how pihole works.

@stephen304
Copy link

Yeah definitely useful regardless. I can set my --upstream to my pihole and that works as well, doubleclick.net resolves to 0.0.0.0. The only issue is my catchall rule, which I admit is a bit niche. But on the other hand, handling not having access to recursive resolving abilities, including a case where the clearnet is completely inaccessible, would solve the use case of using hns on an isolated intranet or mesh network where clearnet access isn't available at all, I think I mentioned before we have mesh nodes that may have spotty clearnet access, but could potentially connect to other hns peers who do have clearnet access over the mesh. So being able to resolve hns domains when only the hns network is accessible would be a nice to have.

Regardless, this is a good step towards enabling hnsd on a wider variety of configurations.

@pinheadmz pinheadmz marked this pull request as draft March 18, 2021 19:09
@pinheadmz
Copy link
Member Author

pinheadmz commented Mar 18, 2021

So, unfortunately I think I need cut bait on this one for now and come back to it later, it may require a much bigger refactor or actually patching up libunbound to bend to our will. Here's what I learned:

  • The upstream fallback recursive resolver must be called by the root server. Why? Because of the situation where an HNS name may have an ICANN name delegated as its NS:

request: google.com
-> root server rejects
-> query forwarded upstream to 1.1.1.1
-> no problem, resolved

request: proofofconcept
-> root server gets NS+glue from HNS root zone
-> no problem, resolved

request: rough
-> root server gets NS a.dns.park.io from HNS root zone
-> recursive asks root again for io
-> root server rejects
-> query for io forwarded upstream to 1.1.1.1
-> question has not been answered 😢

So, in 535a2bf I tried moving the fallback resolver to root server in place of the hard-coded ICANN lookup, but this didn't really work either. It might, but it's going to need a lot more hacky code, because what we want to do is convert an authoritative request into a recursive request, and the data types are all different (hns resource serialization vs unbound result struct).

The ideal fix for this is a wild configuration for the unbound recursive resolver itself that says "if the stubbed root zone returns NXDOMAIN, give up and forward an entirely new request to this upstream recursive..."

@stephen304
Copy link

Gotcha, thanks for your work so far on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for --upstream flag
2 participants