Improve DNS resolver #260

mikz · 2017-02-13T10:54:10Z

Improve performance of our DNS resolver.

In the end there is going to be local caching DNS resolver in the same container. It is way easier to just leave the hard work on another component.

That means in the native deployment the user is responsible for providing local caching dns server for optimal performance. /cc @3scale/support

@3scale/productization after merging we are going to require s2i-openresty with dnsmasq (https://github.com/3scale/s2i-openresty/releases/tag/1.11.2.2-2).

mikz · 2017-02-15T09:04:55Z

doc/dns-resovler.md

+* When cache is stale, use stale record and trigger de-duplicated query in the background.
+* If record can't be updated, use stale cache for unlimited time, but try to recover.
+
+This has one drawback: in low traffic scenario, the first requests after some time can hit wrong server before the cache is updated.


I hope to solve that in the future in the load balancer. If the server is unavailable then remove that cached record and try to do the query again. The load balancer code should be able to mark hosts as unhealthy and do retries (does not do now). I'm not describing it as it is really in the future for now.

mikz · 2017-02-15T12:26:56Z

doc/dns-resovler.md

+
+Using OpenResty's `lua-resty-dns` we can connect to any DNS server and implement our own resolver.
+
+DNS resolver caches each record individually and fetches it from the cache recursively. That allows us to have layers of cache and fill just the missing layers. Example: CNAME (TTL 3600) -> CNAME (TTL 360) -> A (TTL 60) will just query the A record when it expires.


@unleashed and @jmprusi suggested to use local caching DNS resolver instead.

While that is a good idea, it might be harder to pack another dependency.

unleashed

Comments - Relaying queries to a specific DNS server has some (small) drawbacks, but I think it's the way to go, with some cache and stale information layer on top.

unleashed · 2017-02-15T13:14:18Z

doc/dns-resovler.md

+
+* caching for duration of TTL
+* resolve A and CNAME records
+* using stale cache when TTL expires and cache can't be updated


This should probably be expanded - the case in which a stale cached host is always used because name resolving never comes back should produce noisy alerts to administrators.

Agreed this should produce alerts. But IMO it should not produce errors when the connection works just fine. What we've seen in the Dyn outage was that DNS records haven't updated and everything would work if you'd just keep connecting to the stale records.

Eventually when those records are marked as unhealthy by the load balancer it should return 500s.

unleashed · 2017-02-15T13:15:47Z

doc/dns-resovler.md

+* resolve A and CNAME records
+* using stale cache when TTL expires and cache can't be updated
+* many parallel requests to one worker result in one query
+* connect to multiple DNS servers in parallel


Is this a real requirement? Why? I think supporting multiple DNS servers should be ok, but having to connect to all or a subset of them in parallel is probably going to be wasted effort most of the time unless the queries are for different hostnames.

Also, if multiple hosts need resolving at the same time, IIRC you can coalesce all queries in a single DNS request, but that will make the response be as slow as the slowest name resolving (so if used this needs to be careful).

OpenShift provides you with 2 nameservers. One for internal and other for external hostnames.
The internal does not resolve external requests and vice versa.
It would be possible, but quite hard to detect if the query is supposed to go to one or the other.

IMO the easiest way is to just do two queries. But if we delegate this work to local caching DNS server it is not that critical (and there would be just one server anyway).

unleashed · 2017-02-15T13:25:13Z

doc/dns-resovler.md

+
+### Low traffic
+
+A request is made to APIcast and DNS cache is updated. Then there is no request for TTL of that record and another request is made after that record expires. Should APIcast try to use stale cache or hold the request? For how long?


If the cache is updated, no DNS request has to be performed. I assume the set of hosts that need resolving is going to be generally fixed at some low number. Having a timer re-resolve out-of-band before the TTL expires should be good. If some hosts are seldom used, then keeping an indication about how often they need resolving would save on extra requests, perhaps waiting a multiple of their TTL if the number of hosts to resolve is large, and returning the stale info when a new inusual request arrives (and triggering an early update of the DNS info on the assumption that the request that just happened is likely to be followed in the near future by another one).

I was thinking about that too. But here are counterarguments I could imagine:

number of scheduled timers in OpenResty is limited to 256 or some similar low number

this has to work on APIcast hosted with theoretically hundreds of services running on multi-tenant instance

hosts can appear and disappear pretty much all the time, the cache is LRU

I think preloading of any kind will be expensive when we get to scale of hundreds of services in one instance.

I was thinking that maybe we could try to update the cache in some fraction of the TTL when there is a request. For example TTL is 60 so when there is a request incoming and there is 6 seconds left issue one timer to refresh the cache.

What would not work is to schedule a timer for refresh in TTL-1 seconds when the record is fetched the first time.

But that is the same what you are explaining, right?

Yes, the idea is to hide resolving latencies as much as possible. Also, when the oob refreshing cannot refresh fast enough to avoid a window of time in which the TTL is expired but the resolving has not finished, deciding to return stale data back is probably good enough (as opposed to waiting on the refreshing to finish), and is consistent with the behaviour that would occur if the refresh never actually finished or didn't do it in a reasonable time.

unleashed · 2017-02-15T13:30:30Z

doc/dns-resovler.md

+
+1000 requests are made in the same time. First that arrives starts resolving DNS and others should wait until it is completed. After that request is completed, other requests are going to use that result.
+
+If first request timeouts, next one should try to do another DNS query and other requests should wait for it to complete.


So:

First resolving request for a given hostname "grabs" the DNS resolver and performs it.

Further requests for the same hostname wait on the response from the DNS resolver for the ongoing DNS request.

A limit on the number of concurrent DNS requests should be set, and new hostname requests wait for resources to be freed (or a timeout) or fail.

I would add the possibility of, again assuming not many different hostnames will be needed, a warm up by querying the list of expected hostnames at boot time.

Preloading would make sense with configuration loaded on boot as defined in (#230).
It would not be possible in the cloud lazy load mode (or would when configuration is loaded).

unleashed · 2017-02-15T13:33:49Z

doc/dns-resovler.md

+
+### High traffic (with cache)
+
+With constant traffic 1000 requests per second when the DNS cache expires should we use stale cache before it is refreshed? Or should requests wait for it to be updated?


As pointed above: try to keep the entries refreshed before they expire. Least used entries should gradually stop being updated before the TTL expires to save on resources (how this is implemented is something to be decided later). Stale cache is a last resort to try to keep the API working as a best effort mechanism, but should produce loud alerts (and/or be configurable for on/off).

That would be nice, but if we use dnsmasq that is not how it is going to work. Dnsmasq is going to expire that record and then will query again.

While I agree it would be nice it might be too much work to implement properly. Using stale cache for duration of the cache refresh (few ms, up to a second?) shouldn't hurt anyone and is much easier to implement then ahead of time cache refresh. And almost impossible when we use dnsmasq.

unleashed · 2017-02-15T13:43:45Z

doc/dns-resovler.md

+
+Querying multiple nameservers in parallel is accomplished by using "light thread". The first nameserver that responds is used.
+
+De-duplicating queries is achieved by using ngx.semaphore to wait for already executing queries. The waiting algorithm can be very much improved as right now it just waits for any query being executed, not the specific one. Stale cache is being returned when timeout is reached and query can't be completed on time. Queries are performed in the background timer loop.


This is very similar to what I explain above, except having entries continually refreshed unless they cross a "unused" threshold.

unleashed · 2017-02-15T13:46:53Z

doc/dns-resovler.md

+
+De-duplicating queries is achieved by using ngx.semaphore to wait for already executing queries. The waiting algorithm can be very much improved as right now it just waits for any query being executed, not the specific one. Stale cache is being returned when timeout is reached and query can't be completed on time. Queries are performed in the background timer loop.
+
+Search scope defined in `/etc/resolv.conf` is evaluted one by one until matching record is returned. This could be optimized by executing all queries in parallel.


I would try to avoid having to do this at all. If the code that requests this is all local, just using fully-qualified names could be set as a rule and avoid this altogether - note that "random_service" can actually be a fully-qualified name for the default search scope of "." (not sure whether this applies to the way Openshift does name resolving).

I'm not sure we can convince all our customers to use fully qualified names everywhere. All the utilities work just fine with just the relative names. I was struggling with this myself and spent some time debugging why I can curl system-app:3000 but not connect to it from apicast.

In theory we could try to run following heuristics to determine if the name is local or not:

no . - internal

some . but matches the search scope - internal

some . but not matches the search scope - external

The issue is even if we can detect which hostnames are internal/external we don't know which name server is going to resolve what. So we would have to implement logic to remember which search scope belongs to which name server.

That just sounds too hard. Even dnsmasq will connect to all name servers in parallel and use the first response.

unleashed · 2017-02-15T13:50:54Z

doc/dns-resovler.md

+
+Following rules apply to handle scenarios described above:
+
+* When cache is missing, all incoming requests de-duplicate queries and wait for the first one to complete. If does not complete on time, the next one tries again, ...


I would change this to condition all requests pending on a name being resolved on the result of the query. Requests of name resolving should seldom trigger a real DNS resolving request right away. We do something akin to this in XC, and strategies are:

At boot time, do cache warm-up of expected host names addresses.

Keep an out-of-band, continuous task going through most used hostnames and resolving them before the TTL expires.

When a stale entry is found, return the stale addresses and schedule an immediate resolving of the hostname in the assumption that the entry is now becoming hotter and further requests could happen (which would also mean that it would be auto-resolved before TTL expires for a few cycles until deemed seldom used again).

Using dnsmasq we can use it as synchronisation mechanism. It de-duplicates queries.

So if there are 1000 parallel queries, we can just hit dnsmasq and let it resolve it.

As described in #260 (comment) boot cache warm-up would make sense in some configurations and would be good addition.

Having the continuous out of band resolver seems not so needed as it will not (or will be hard to) scale up to hundreds of services as described in #260 (comment).

Returning stale entry and triggering refresh would be reasonable option. But as described in #260 (comment) dnsmasq would not allow us to refresh the cache until it is expired. I'd try to avoid scheduling work into the future because there is a hard cap on number of timers.
I guess in context of sparsely used API we can just not preload the DNS and let user see the query time.
Then we should use stale cache only when there is a query running for that record.

unleashed · 2017-02-15T13:51:11Z

doc/dns-resovler.md

+Following rules apply to handle scenarios described above:
+
+* When cache is missing, all incoming requests de-duplicate queries and wait for the first one to complete. If does not complete on time, the next one tries again, ...
+* When cache is stale, use stale record and trigger de-duplicated query in the background.


Ah, I think we are agreeing with what I just wrote above here.

unleashed · 2017-02-15T13:53:53Z

doc/dns-resovler.md

+
+* When cache is missing, all incoming requests de-duplicate queries and wait for the first one to complete. If does not complete on time, the next one tries again, ...
+* When cache is stale, use stale record and trigger de-duplicated query in the background.
+* If record can't be updated, use stale cache for unlimited time, but try to recover.


Add to this: sound the alarm and transition to red alert, make noise, eventually drop the stale cache and start returning 500s. Admins should know ASAP, and if not taken care of, things should start failing hard (instead of, for example, partially failing because some host not being available is not a fatal error but a subtle one). I guess there are a lot of different possible behaviours, perhaps configuration or overriding of default behaviour is desirable.

Yes. As described in #260 (comment) I think the stale cache is not the issue here, but rather marking hosts down by the load balancer.
That would eventually start raising 500s.

use stale cache when another query is running but don't try to do any de-duplication

container for development with mounted volume

macejmic · 2017-02-17T13:19:08Z

doc/dns-resovler.md

+* caching for duration of TTL
+* resolve A and CNAME records
+* using stale cache when TTL expires and cache can't be updated
+* many parallel requests to one worker result in one query


is this sentence related to this (?)

When cache is missing, all incoming requests de-duplicate queries and wait for the first one to complete.

Yep. But in the end we delegate this work to the local DNS cache server.

octobot assigned mikz Feb 13, 2017

octobot added the T-obux label Feb 13, 2017

mikz added in progress B-current labels Feb 13, 2017

mikz force-pushed the dns-resolver branch 4 times, most recently from a17bf09 to 8d3f784 Compare February 15, 2017 09:02

mikz commented Feb 15, 2017

View reviewed changes

unleashed reviewed Feb 15, 2017

View reviewed changes

mikz force-pushed the dns-resolver branch 5 times, most recently from bab625f to 4f091d3 Compare February 16, 2017 15:23

mikz mentioned this pull request Feb 16, 2017

install dnsmasq 3scale/s2i-openresty#18

Merged

mikz force-pushed the dns-resolver branch from 24f82f2 to 3f85647 Compare February 17, 2017 08:26

mikz added 11 commits February 17, 2017 09:40

[doc] document our dns resolver

5a50771

[dns] parallel query using "light thread"

e47f755

[management] api to inspect dns cache

9281eb2

[dns] get stale records

d13d1bb

[dns] default nameserver port 53

820de7e

[dns] rely on dnsmasq for query de-duplication

868c7c8

use stale cache when another query is running but don't try to do any de-duplication

[make] dev

1e536f8

container for development with mounted volume

[dns] use only given RESOLVER

1c656d4

[make] clean

cfcb5ce

[dns] run dnsmasq in docker entrypoint

7e91b21

[make] print logs in the build

7d28506

mikz force-pushed the dns-resolver branch from 3f85647 to 7d28506 Compare February 17, 2017 08:40

mikz changed the title ~~[wip] Improve DNS resolver~~ Improve DNS resolver Feb 17, 2017

mikz merged commit b04ad80 into master Feb 17, 2017

mikz added B-current and removed B-current in progress labels Feb 17, 2017

mikz deleted the dns-resolver branch February 17, 2017 09:38

mikz added a commit that referenced this pull request Feb 17, 2017

[changelog] update changelog after merging #260

a40bb3c

mikz mentioned this pull request Feb 17, 2017

[changelog] update changelog #264

Merged

macejmic reviewed Feb 17, 2017

View reviewed changes

mikz mentioned this pull request Mar 21, 2017

[dns] use both RESOLVER and resolver.conf #324

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve DNS resolver #260

Improve DNS resolver #260

mikz commented Feb 13, 2017 •

edited

Loading

mikz Feb 15, 2017

mikz Feb 15, 2017

unleashed left a comment

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

unleashed Feb 15, 2017 •

edited

Loading

mikz Feb 15, 2017

unleashed Feb 15, 2017

unleashed Feb 15, 2017

mikz Feb 15, 2017

macejmic Feb 17, 2017

mikz Feb 17, 2017


		Using OpenResty's `lua-resty-dns` we can connect to any DNS server and implement our own resolver.

		DNS resolver caches each record individually and fetches it from the cache recursively. That allows us to have layers of cache and fill just the missing layers. Example: CNAME (TTL 3600) -> CNAME (TTL 360) -> A (TTL 60) will just query the A record when it expires.


		### Low traffic

		A request is made to APIcast and DNS cache is updated. Then there is no request for TTL of that record and another request is made after that record expires. Should APIcast try to use stale cache or hold the request? For how long?


		1000 requests are made in the same time. First that arrives starts resolving DNS and others should wait until it is completed. After that request is completed, other requests are going to use that result.

		If first request timeouts, next one should try to do another DNS query and other requests should wait for it to complete.


		### High traffic (with cache)

		With constant traffic 1000 requests per second when the DNS cache expires should we use stale cache before it is refreshed? Or should requests wait for it to be updated?


		Querying multiple nameservers in parallel is accomplished by using "light thread". The first nameserver that responds is used.

		De-duplicating queries is achieved by using ngx.semaphore to wait for already executing queries. The waiting algorithm can be very much improved as right now it just waits for any query being executed, not the specific one. Stale cache is being returned when timeout is reached and query can't be completed on time. Queries are performed in the background timer loop.


		De-duplicating queries is achieved by using ngx.semaphore to wait for already executing queries. The waiting algorithm can be very much improved as right now it just waits for any query being executed, not the specific one. Stale cache is being returned when timeout is reached and query can't be completed on time. Queries are performed in the background timer loop.

		Search scope defined in `/etc/resolv.conf` is evaluted one by one until matching record is returned. This could be optimized by executing all queries in parallel.


		Following rules apply to handle scenarios described above:

		* When cache is missing, all incoming requests de-duplicate queries and wait for the first one to complete. If does not complete on time, the next one tries again, ...

Improve DNS resolver #260

Improve DNS resolver #260

Conversation

mikz commented Feb 13, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

unleashed Feb 15, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mikz commented Feb 13, 2017 •

edited

Loading

unleashed Feb 15, 2017 •

edited

Loading