Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for running a nodelocal dns cache #3861

Merged

Conversation

nysthee
Copy link
Contributor

@nysthee nysthee commented Dec 7, 2018

After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.

I believe this can usefull for more people.

kubernetes/kubernetes@73b548d
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md

Feedback welcome!

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Dec 7, 2018
@nysthee nysthee force-pushed the feature/add-node-local-dns-cache branch from aa19156 to 4a31e48 Compare December 7, 2018 23:19
@nysthee
Copy link
Contributor Author

nysthee commented Dec 7, 2018

I would like a suggestion as well on where to put the documentation for this.

@nysthee nysthee force-pushed the feature/add-node-local-dns-cache branch 2 times, most recently from c803ca5 to 0a46af3 Compare December 7, 2018 23:41
After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.

I believe this can usefull for more people.

kubernetes/kubernetes@73b548d
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md
@nysthee nysthee force-pushed the feature/add-node-local-dns-cache branch from 0a46af3 to 8172b6b Compare December 7, 2018 23:41
Copy link
Member

@woopstar woopstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can you describe what DNS issues you were encountering?
  • Can you please provide a doc in /docs folder and eventually update the DNS stack documentation

I guess to use the nodelocaldns cache, you'll have to use the defined local dns ip as resolver? 169.254.25.10 that is, currently ?

If you had issues with the conntract table being filled up with DNS entries, the you can avoid that by setting the following sysctl:

- name: 'net.netfilter.nf_conntrack_udp_timeout_stream'
  value: '10'
- name: 'net.netfilter.nf_conntrack_udp_timeout'
  value: '10'

@woopstar woopstar self-assigned this Dec 10, 2018
@woopstar
Copy link
Member

Just ping me when you want me to review again :)

@nysthee
Copy link
Contributor Author

nysthee commented Dec 10, 2018

Will do :)

@nysthee
Copy link
Contributor Author

nysthee commented Dec 10, 2018

The issues I was encountering were unexplainable DNS timeouts. Like every few requests.
I never noticed this before until I started running Kafka in my cluster and services was complaining service hostnames of depending services were not resolving.
Atm, the issue is mitigated with installing a nightly of Flannel but according to the release notes of 1.13, the implementation of the referenced KEP should solve this ass well.

References:

@nysthee
Copy link
Contributor Author

nysthee commented Dec 10, 2018

@woopstar if you could review again please

@woopstar
Copy link
Member

@woopstar if you could review again please

done

@woopstar
Copy link
Member

ci check this

@nysthee
Copy link
Contributor Author

nysthee commented Dec 10, 2018

Latest changes pushed as well.

@nysthee
Copy link
Contributor Author

nysthee commented Dec 10, 2018

Maybe I should squash the commits?

@woopstar
Copy link
Member

auto squash is enabled

@nysthee
Copy link
Contributor Author

nysthee commented Dec 11, 2018

ci check this

@ant31
Copy link
Contributor

ant31 commented Dec 11, 2018

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 11, 2018
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ant31, nysthee

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 11, 2018
@k8s-ci-robot k8s-ci-robot merged commit 3e3ee0a into kubernetes-sigs:master Dec 11, 2018
@woopstar
Copy link
Member

I still don't get how this works as a cache unless you use the nodelocaldns_ip as resolver?

@woopstar
Copy link
Member

@nysthee

Seems I'm right. Looking here they set the nodelocal ip as the first cluster ip that gets populated into the pod.

You need to apply a PR fix asap where you apply the nodelocaldns_ip as the first ip in the list here

What this basically does is to simply just start a DNS pod on each node instead. Then you forwards requests from pods on a node to the local DNS pods running on the same node, which will prevent a DNAT. If that pod does not work, the clusterIP for the DNS plugin (kube-dns, coredns etc) is used. Here they just use CoreDNS as cache too. You can use Unbound, dnsmasq etc. too.

This should be enabled by default btw.

@woopstar
Copy link
Member

@nysthee

Seems I'm right. Looking here they set the nodelocal ip as the first cluster ip that gets populated into the pod.

You need to apply a PR fix asap where you apply the nodelocaldns_ip as the first ip in the list here

What this basically does is to simply just start a DNS pod on each node instead. Then you forwards requests from pods on a node to the local DNS pods running on the same node, which will prevent a DNAT. If that pod does not work, the clusterIP for the DNS plugin (kube-dns, coredns etc) is used. Here they just use CoreDNS as cache too. You can use Unbound, dnsmasq etc. too.

This should be enabled by default btw.

Sorry. What you actually need to do is to overwrite the --cluster-dns to ONLY contain the nodelocaldns_ip if it's enabled (enable_nodelocaldns == true). As the local DNS cache pod will forward queries to kube-dns / CoreDNS.

@nysthee
Copy link
Contributor Author

nysthee commented Dec 11, 2018

@woopstar
Is submitted a pr: #3879

spisarski pushed a commit to cablelabs/kubespray that referenced this pull request Feb 11, 2019
* Add support for running a nodelocal dns cache

After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.

I believe this can usefull for more people.

kubernetes/kubernetes@73b548d
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md

* Add requested changes

* Add additional requested changes + documentation

* Add requested changes after review

* Replace incorrect variable
spisarski pushed a commit to cablelabs/kubespray that referenced this pull request Feb 11, 2019
* Add support for running a nodelocal dns cache

After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.

I believe this can usefull for more people.

kubernetes/kubernetes@73b548d
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md

* Add requested changes

* Add additional requested changes + documentation

* Add requested changes after review

* Replace incorrect variable
LuckySB pushed a commit to southbridgeio/kubespray that referenced this pull request Feb 17, 2019
* Add support for running a nodelocal dns cache

After encountering dns issues in a cluster I was recently working on I
noticed Kubernetes 1.13 introduced support for running a nodelocal dns
cache.

I believe this can usefull for more people.

kubernetes/kubernetes@73b548d
https://github.com/kubernetes/enhancements/blob/master/keps/sig-network/0030-nodelocal-dns-cache.md

* Add requested changes

* Add additional requested changes + documentation

* Add requested changes after review

* Replace incorrect variable
@Miouge1 Miouge1 mentioned this pull request Mar 20, 2019
@nvtkaszpir
Copy link

Sorry for digging up the grave.
This change sets node-local-dns priorityClassName: system-cluster-critical - why is that?
Thouldn't it be priorityClassName: system-node-critical?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants