Skip to content
This repository has been archived by the owner on Sep 4, 2021. It is now read-only.

DNS resolution failing from pods #794

Open
jbw976 opened this issue Dec 22, 2016 · 7 comments
Open

DNS resolution failing from pods #794

jbw976 opened this issue Dec 22, 2016 · 7 comments

Comments

@jbw976
Copy link
Member

jbw976 commented Dec 22, 2016

I have a pretty consistent repro where pods do not seem to have the ability to do DNS look ups. The nodes (VM's) can, but pods cannot.

Version information:

> vagrant version
Installed Version: 1.8.1

> vagrant box list
coreos-alpha (virtualbox, 1262.0.0)

> kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1", GitCommit:"82450d03cb057bab0950214ef122b67c83fb11df", GitTreeState:"clean", BuildDate:"2016-12-14T00:57:05Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.1+coreos.0", GitCommit:"cc65f5321f9230bf9a3fa171155c1213d6e3480e", GitTreeState:"clean", BuildDate:"2016-12-14T04:08:28Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

> git --no-pager show -q HEAD
commit c0bad9d170bb1c2eeead639bc1fae348143af70a
Merge: c465df2 5a2caa8
Author: Rob Szumski <rob@robszumski.com>
Date:   Tue Dec 20 13:39:57 2016 -0500

    Merge pull request #790 from stephanlindauer/patch-1
    
    missing reference

Repro Steps

Follow the instructions to set up the kubernetes vagrant multi-node cluster found at https://coreos.com/kubernetes/docs/latest/kubernetes-on-vagrant.html (included below for convenience):

git clone https://github.com/coreos/coreos-kubernetes.git
cd coreos-kubernetes/multi-node/vagrant

# use all the defaults in the Vagrantfile, especially $update_channel = "alpha"
vagrant up

export KUBECONFIG="${KUBECONFIG}:$(pwd)/kubeconfig"
kubectl config use-context vagrant-multi

# wait until VM's and kube-system pods are up
kubectl get nodes

Now create a simple daemonset spec that will run something with a shell (I picked alpine:3.4):

cat >> test.yaml << EOF
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: test
spec:
  template:
    metadata:
      labels:
        app: test
    spec:
      containers:
      - name: test
        image: alpine:3.4
        imagePullPolicy: IfNotPresent
        command: ["sleep", "36500d"]
EOF

Start the daemonset and ensure the pods are running:

> kubectl create -f test.yaml
> kubectl get pods
NAME         READY     STATUS    RESTARTS   AGE
test-0nwwc   1/1       Running   0          4s
test-szlg9   1/1       Running   0          4s

Connect to one of the pods to open a shell and try some DNS lookups that will fail:

kubectl exec -it test-0nwwc sh
/ # wget www.google.com
wget: bad address 'www.google.com'

/ # nslookup google.com
nslookup: can't resolve '(null)': Name does not resolve

nslookup: can't resolve 'google.com': Try again

/ # cat /etc/resolv.conf 
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.3.0.10
options ndots:5

NOTE: Sometimes this issue only repros on one of the pods, so make sure to try DNS from both pods.

In contrast, running the same DNS lookups on one of the node VMs works OK:

> vagrant ssh c1
Last login: Thu Dec 22 00:32:22 UTC 2016 from 10.0.2.2 on pts/0
CoreOS alpha (1262.0.0)
Update Strategy: No Reboots
Failed Units: 1
  update-engine.service

core@c1 / $ nslookup google.com
Server:		10.0.2.3
Address:	10.0.2.3#53

Non-authoritative answer:
Name:	google.com
Address: 216.58.219.14

Please let me know if there is any other information I can collect.

@the-nicolas
Copy link

I have the same issue and even DNS lookups for other services fail.

Log of dns pod shows:
13:01:18.017118 1 dns.go:274] New service: test-nginx-svc

...but "nslookup test-nginx-svc" in other running pod fails.

It´s also not possible to ping the dns resolver 10.3.0.10 from the pod... But it is running.

@Bekt
Copy link

Bekt commented Dec 27, 2016

This doesn't seem Vagrant specific. Having the same issue on AWS with the latest kube-dns addon.

@rdtr
Copy link
Contributor

rdtr commented Dec 29, 2016

What about the logs of kube-dns then?
I've already updated my cluster to v1.5.1 and no problem on Vagrant local and AWS both.
(I have built the cluster from scratch using multi-node scripts)

@Bekt
Copy link

Bekt commented Dec 30, 2016

Looks like I had --network-plugin= instead of --network-plugin=cni in my master kubelet. Not sure how I missed that, but that was the issue.

@valichek
Copy link

I had this problem when used alpha coreos channel, looks like it is working with stable

@jbw976
Copy link
Member Author

jbw976 commented Jan 5, 2017

@rdtr, I have included all the log output for kube-dns pods/containers in this gist: https://gist.github.com/jbw976/41145c9e4f8dcc7e106839cd22801641

Let me know if there's any other output I can grab for you.

Could this be related to coreos/bugs#1743 which was fixed in CoreOS 1284? https://github.com/coreos/manifest/releases/tag/v1284.0.0?

@pswenson
Copy link

pswenson commented Jan 12, 2017

I've had this issue a few times in the past few days... @jbw976 do you have a work around?

I'm using kube 1.5.1 CoreOS stable (1235.4.0) and have had this problem twice in the past couple days.

I see it might be fixed in cores 1284... but when I update I only get 1235.6.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants