kube-dns deployment was silently missing #467

mumoshu · 2017-03-27T02:14:52Z

I'd been running the E2E test against #465 and it failed like:

Summarizing 4 Failures:

[Fail] [k8s.io] DNS [It] should provide DNS for services [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:219

[Fail] [k8s.io] Networking [It] should provide Internet connection for containers [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/networking.go:48

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:219

[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1580

Ran 117 of 474 Specs in 3522.154 seconds
FAIL! -- 113 Passed | 4 Failed | 0 Pending | 357 Skipped --- FAIL: TestE2E (3522.17s)
FAIL

Looking into the kube-system namespace revealed that kube-dns pods were completely missing.

$ KUBECONFIG=assets/kubeawstest3/kubeconfig kubectl get po --all-namespaces
NAMESPACE     NAME                                                                    READY     STATUS    RESTARTS   AGE
kube-system   heapster-v1.3.0-567306696-1qcdm                                         2/2       Running   0          1h
kube-system   kube-apiserver-ip-10-0-0-124.ap-northeast-1.compute.internal            1/1       Running   0          1h
kube-system   kube-controller-manager-ip-10-0-0-124.ap-northeast-1.compute.internal   1/1       Running   0          1h
kube-system   kube-dns-autoscaler-2813114833-ctfsp                                    1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-124.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-135.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-206.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-36.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-61.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-78.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-scheduler-ip-10-0-0-124.ap-northeast-1.compute.internal            1/1       Running   0          1h
kube-system   kubernetes-dashboard-v1.5.1-0q4n9                                       1/1       Running   0          1h

Manually re-running /opt/bin/install-kube-system did fix the problem.

$ FOCUS='.*DNS.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 2 of 474 Specs in 33.560 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 472 Skipped PASS

$ FOCUS='.*Networking.*Internet.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 1 of 474 Specs in 7.361 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 473 Skipped PASS

$ FOCUS='.*Kubectl.*Guestbook.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 1 of 474 Specs in 96.491 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 473 Skipped PASS

How could it happen?
Looking into journalctl -u install-kube-system carefully showed that in the very early stage of the script, the first curl command for installing kube-dns had failed due to the missing kube-system namespace!

https://gist.github.com/mumoshu/e4bfa66bbc6f81ced77a8a4ec1b54fa3#file-install-kube-system-1-boot-log-L130

So let's fix it; should we wait until the kube-system namespace to be available in the beginning of install-kube-system or elsewhere?

The text was updated successfully, but these errors were encountered:

Resolves kubernetes-retired#467

* kubernetes-incubator/master: 'Cluster-dump' feature to export Kubernetes Resources to S3 Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml. Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment) Wait until kube-system becomes ready Resolves kubernetes-retired#467 Make the validation error message when KMS failed more friendly Now, when e.g. AWS_ACCESS_KEY_ID is missing, the error looks like: ``` $ kube-aws validate --s3-uri ... *snip* Error: Failed to initialize cluster driver: failed to read/create TLS assets: UnrecognizedClientException: The security token included in the request is invalid. ``` Fix a race between systemd services: cfn-etc-environment and etcdadm-reconfigure Fix API endpoint from HA controllers Make AMI fetching even more reliable Resolves kubernetes-retired#474 etcd unit should unconditionally depend on cfn-etcd-environment Rescheduler to 0.3.0 which uses k8s 1.6 WIP: Bump to Kubernetes v1.6.1 (kubernetes-retired#492) Improve auth tokens / TLS bootstrapping UX (kubernetes-retired#489) Fix RBAC in Kubernetes 1.6. Fix etcdadm when terminated instances still exist. Retry userdata download Perform docker post-start check Bump to calico 2.1.1

…update-to-latest-kube-aws-master to hcom-flavour * commit '175217133f75b3c251536bc0d51ccafd2b1a5de4': Fix the dead-lock while bootstrapping etcd cluster when wait signal is enabled. Resolves kubernetes-retired#525 Fix elasticFileSystemId to be propagated to node pools Resolves kubernetes-retired#487 'Cluster-dump' feature to export Kubernetes Resources to S3 Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml. Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment) Wait until kube-system becomes ready Resolves kubernetes-retired#467

Resolves kubernetes-retired#467

mumoshu added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2017

mumoshu added this to the v0.9.6-rc.2 milestone Apr 10, 2017

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Apr 10, 2017

Wait until kube-system becomes ready

3a79f37

Resolves kubernetes-retired#467

mumoshu mentioned this issue Apr 10, 2017

Wait until kube-system becomes ready #519

Merged

mumoshu closed this as completed in #519 Apr 12, 2017

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018

Wait until kube-system becomes ready

77f716f

Resolves kubernetes-retired#467

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kube-dns deployment was silently missing #467

kube-dns deployment was silently missing #467

mumoshu commented Mar 27, 2017 •

edited

Loading

kube-dns deployment was silently missing #467

kube-dns deployment was silently missing #467

Comments

mumoshu commented Mar 27, 2017 • edited Loading

mumoshu commented Mar 27, 2017 •

edited

Loading