DNS resolution failing from pods #794

jbw976 · 2016-12-22T00:37:34Z

I have a pretty consistent repro where pods do not seem to have the ability to do DNS look ups. The nodes (VM's) can, but pods cannot.

Version information:

> vagrant version
Installed Version: 1.8.1

> vagrant box list
coreos-alpha (virtualbox, 1262.0.0)

> kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

> git --no-pager show -q HEAD
commit c0bad9d170bb1c2eeead639bc1fae348143af70a
Merge: c465df2 5a2caa8
Author: Rob Szumski <rob@robszumski.com>
Date:   Tue Dec 20 13:39:57 2016 -0500

    Merge pull request #790 from stephanlindauer/patch-1
    
    missing reference

Repro Steps

Follow the instructions to set up the kubernetes vagrant multi-node cluster found at https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html (included below for convenience):

git clone https://github.com/coreos/coreos-kubernetes.git
cd coreos-kubernetes/multi-node/vagrant

# use all the defaults in the Vagrantfile, especially $update_channel = "alpha"
vagrant up

export KUBECONFIG="${KUBECONFIG}:$(pwd)/kubeconfig"
kubectl config use-context vagrant-multi

# wait until VM's and kube-system pods are up
kubectl get nodes

Now create a simple daemonset spec that will run something with a shell (I picked alpine:3.4):

cat >> test.yaml << EOF
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: test
spec:
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: alpine:3.4
        imagePullPolicy: IfNotPresent
        command: ["sleep", "36500d"]
EOF

Start the daemonset and ensure the pods are running:

> kubectl create -f test.yaml
> kubectl get pods
NAME         READY     STATUS    RESTARTS   AGE
test-0nwwc   1/1       Running   0          4s
test-szlg9   1/1       Running   0          4s

Connect to one of the pods to open a shell and try some DNS lookups that will fail:

kubectl exec -it test-0nwwc sh
/ # wget www.google.com
wget: bad address 'www.google.com'

/ # nslookup google.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'google.com': Try again

/ # cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.3.0.10
options ndots:5

NOTE: Sometimes this issue only repros on one of the pods, so make sure to try DNS from both pods.

In contrast, running the same DNS lookups on one of the node VMs works OK:

> vagrant ssh c1
Last login: Thu Dec 22 00:32:22 UTC 2016 from 10.0.2.2 on pts/0
CoreOS alpha (1262.0.0)
Update Strategy: No Reboots
Failed Units: 1
  update-engine.service

core@c1 / $ nslookup google.com
Server:		10.0.2.3
Address:	10.0.2.3#53

Non-authoritative answer:
Name:	google.com
Address: 216.58.219.14

Please let me know if there is any other information I can collect.

The text was updated successfully, but these errors were encountered:

the-nicolas · 2016-12-22T17:19:40Z

I have the same issue and even DNS lookups for other services fail.

Log of dns pod shows:
13:01:18.017118 1 dns.go:274] New service: test-nginx-svc

...but "nslookup test-nginx-svc" in other running pod fails.

It´s also not possible to ping the dns resolver 10.3.0.10 from the pod... But it is running.

Bekt · 2016-12-27T20:48:17Z

This doesn't seem Vagrant specific. Having the same issue on AWS with the latest kube-dns addon.

rdtr · 2016-12-29T17:13:24Z

What about the logs of kube-dns then?
I've already updated my cluster to v1.5.1 and no problem on Vagrant local and AWS both.
(I have built the cluster from scratch using multi-node scripts)

Bekt · 2016-12-30T07:28:39Z

Looks like I had --network-plugin= instead of --network-plugin=cni in my master kubelet. Not sure how I missed that, but that was the issue.

valichek · 2016-12-30T09:03:45Z

I had this problem when used alpha coreos channel, looks like it is working with stable

jbw976 · 2017-01-05T19:41:02Z

@rdtr, I have included all the log output for kube-dns pods/containers in this gist: https://gist.github.com/jbw976/41145c9e4f8dcc7e106839cd22801641

Let me know if there's any other output I can grab for you.

Could this be related to coreos/bugs#1743 which was fixed in CoreOS 1284? https://github.com/coreos/manifest/releases/tag/v1284.0.0?

pswenson · 2017-01-12T21:42:12Z

I've had this issue a few times in the past few days... @jbw976 do you have a work around?

I'm using kube 1.5.1 CoreOS stable (1235.4.0) and have had this problem twice in the past couple days.

I see it might be fixed in cores 1284... but when I update I only get 1235.6.0

Bekt mentioned this issue Dec 27, 2016

Pods don't respect --cluster-dns #795

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DNS resolution failing from pods #794

DNS resolution failing from pods #794

jbw976 commented Dec 22, 2016 •

edited

Loading

the-nicolas commented Dec 22, 2016

Bekt commented Dec 27, 2016

rdtr commented Dec 29, 2016

Bekt commented Dec 30, 2016

valichek commented Dec 30, 2016

jbw976 commented Jan 5, 2017

pswenson commented Jan 12, 2017 •

edited

Loading

DNS resolution failing from pods #794

DNS resolution failing from pods #794

Comments

jbw976 commented Dec 22, 2016 • edited Loading

Version information:

Repro Steps

the-nicolas commented Dec 22, 2016

Bekt commented Dec 27, 2016

rdtr commented Dec 29, 2016

Bekt commented Dec 30, 2016

valichek commented Dec 30, 2016

jbw976 commented Jan 5, 2017

pswenson commented Jan 12, 2017 • edited Loading

jbw976 commented Dec 22, 2016 •

edited

Loading

pswenson commented Jan 12, 2017 •

edited

Loading