Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Node Local DNS Cache (AKS Local DNS implementation) #3673

Open
damienwebdev opened this issue May 21, 2023 · 13 comments
Open
Assignees
Labels
coreDNS feature-request Requested Features

Comments

@damienwebdev
Copy link

damienwebdev commented May 21, 2023

Is your feature request related to a problem? Please describe.
I'm a developer using NodeJS to server-side render frontend applications. I'm attempting to improve the TTFB of my renders, and in the course of doing so I'm seeing ~8ms of DNS latency when using doing DNS lookups. The important thing to know here is that NodeJS does not cache DNS lookups either in-process or between processes (it relies on OS specific functions and caching like getaddrinfo), leading to a higher than expected volume of DNS requests. There are many articles on the topic:

  1. https://httptoolkit.com/blog/configuring-nodejs-dns/
  2. https://adambrodziak.pl/dns-performance-issues-in-kubernetes-cluster
  3. A video by one of the creators of Libuv

Describe the solution you'd like
I would like to leverage Node Local DNS Cache as described by the Kubernetes team.

Describe alternatives you've considered

  1. nodelocaldns aks routing issue #1642
  2. node-local-dns daemonset is automatically deleted #1435
  3. I've also considered implementing keep-alives connections in SSR.
  4. [Feature] Node local DNS #1492

It looks like (in #1492) the AKS team has already considered this and has already done some intense work to improve network capabilities, but I'm confused (and concerned) about why #1492 was closed without implementing the original feature. It looks (from the outside) like this feature was used as a "placeholder" as a fix for a completely different issue.

Can someone clarify why #1492 was closed?

Additionally, @jnoller points out that user-driven attempts to remedy this problem are also subverted by AKS. Can you explain why? Could we get a flag that allows us to switch this to a daemonset? Otherwise, I'm left quite confused and left with slow HTTP requests for a reason that seems beyond me.

@damienwebdev damienwebdev added the feature-request Requested Features label May 21, 2023
@damienwebdev
Copy link
Author

This could be closed, it's possible for users to implement this themselves, but it would be nice to have the AKS team document this specifically for AKS.

@dengliu
Copy link

dengliu commented Oct 4, 2023

Hi @damienwebdev
Have you been able to deploy Node Local DNS to aks?
I tried both the official solution from k8s and the suggested aks solution here, neither of them works on AKS

@timja
Copy link

timja commented Oct 5, 2023

@neurobion
Copy link

Hi @timja, it has been a few months since your comment and I want to ask if you are still using it without any problems or if you have found something more suitable? Thanks

@artificial-aidan
Copy link

I just implemented this today, seems to be working. A few notes for someone new to this.

In @timja's example, the dns ip is 10.0.0.10, this may not be the case for you. This command can be used to query the ip: kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}

(source: https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns/)

The default memory requests were way too small for my use case, I was seeing 25mb+ of memory used by the node local caches, so make sure you set that correctly, as having a nodelocal pod get OOM killed will result in DNS downtime on that node.

@timja
Copy link

timja commented Feb 7, 2024

it shouldn't get oom killed if there's no limit set, but yes it could probably request more than that if it's needed.

@artificial-aidan
Copy link

If no limit is set then if a node has memory pressure it will be higher priority to get killed if it is using more than its requested memory.

@lomboboo
Copy link

@artificial-aidan @timja
Have you guys figured it out? We also run AKS cluster and after I installed nodelocalcache as per Kubernetes docs (except I had to remove addonmanager.kubernetes.io/mode: Reconcile label) I don't think it is working as I would expect.

When creating new pod in that cluster based on dnsutils image for example and running nslookup google.com we get Server: 10.0.0.10 instead of Server: 169.254.20.10. I would expect to get 10.0.0.10 on the first call and 169.254.20.10 on all calls after that since it should be cached by node-local-dns.

I am curious if it even supposed to work with AKS or is there anything else that has to be done in order for it to work in AKS managed cluster?
Or am I testing it wrongly altogether?

@artificial-aidan
Copy link

I think the way I tested it was to look at DNS queries on the kube-dns metrics. They went way down once nodelocal was working.

@lomboboo
Copy link

lomboboo commented Jun 27, 2024

Thanks for the response.

Can you please elaborate on this a little bit? How did you install nodelocaldns in your AKS?

Did you use curl <coredns-pod-xxx>:9153/metrics to get different metrics? If so, which one did you pay attention to?

@artificial-aidan
Copy link

I use prometheus to scrape all the metrics, don't remember where they came from. But both nodelocal and coredns export the cache hit metric. And you should be able to see the nodelocal metric increasing. I followed the same steps timja did.

@muadnan
Copy link

muadnan commented Aug 29, 2024

Thanks for the response.

Can you please elaborate on this a little bit? How did you install nodelocaldns in your AKS?

Did you use curl <coredns-pod-xxx>:9153/metrics to get different metrics? If so, which one did you pay attention to?

hey, @lomboboo have you figured out it is working as expected or not?

@phealy phealy self-assigned this Dec 13, 2024
@phealy phealy added the coreDNS label Dec 13, 2024
@phealy
Copy link
Contributor

phealy commented Dec 13, 2024

We are in the process of implementing a solution similar but not identical to node-local-dns which we're currently calling AKS Local DNS. The deployment process will be around providing this functionality as a node addon that will be configured before kubelet startup, which avoids a lot of the issues we've seen with productizing node-local-dns (especially things like DNS service interruptions when DaemonSet updates are made and the pods restart). By following this design we can accomplish the following:

  1. aks-local-dns is online before kubelet starts and thus we don't have to play any games with iptables to redirect the kube-dns VIP to the local coredns process.
  2. Updates to coredns on the node occur as part of the node image upgrade process, so they only happen when the node is already drained of workloads and thus don't impact DNS traffic.
  3. Bad search domain completions will be stopped at the coredns instance on the node and never reach the network or the cluster coredns.
  4. serve-stale verify will be part of the default configuration, making short interruptions in DNS traffic less disruptive.

We're keeping all of the benefits node-local-dns brings:

  1. DNS queries from application pods to the node DNS cache can be excluded from connection tracking to avoid filling up the conntrack table.
  2. Queries from the nodes to the cluster coredns will be upgraded to TCP by default, allowing for better failover between upstream coredns pods and faster retries in the event of a dropped packet on the network. Connection pooling will be used to maintain connections between the cache coredns and the cluster coredns, which should mostly eliminate the extra latency that comes from the TCP three-way handshake.
  3. Caching (both positive and negative) will be configurable.

@phealy phealy changed the title [Feature] Support Node Local DNS Cache [Feature] Support Node Local DNS Cache (AKS Local DNS implementation) Dec 13, 2024
@phealy phealy moved this to In Progress (Development) in Azure Kubernetes Service Roadmap (Public) Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
coreDNS feature-request Requested Features
Projects
Status: In Progress (Development)
Development

No branches or pull requests

8 participants