Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

kube-dns deployment was silently missing #467

Closed
mumoshu opened this issue Mar 27, 2017 · 0 comments
Closed

kube-dns deployment was silently missing #467

mumoshu opened this issue Mar 27, 2017 · 0 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@mumoshu
Copy link
Contributor

mumoshu commented Mar 27, 2017

I'd been running the E2E test against #465 and it failed like:

Summarizing 4 Failures:

[Fail] [k8s.io] DNS [It] should provide DNS for services [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:219

[Fail] [k8s.io] Networking [It] should provide Internet connection for containers [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/networking.go:48

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/dns.go:219

[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance]
/go/kubernetes/_output/local/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1580

Ran 117 of 474 Specs in 3522.154 seconds
FAIL! -- 113 Passed | 4 Failed | 0 Pending | 357 Skipped --- FAIL: TestE2E (3522.17s)
FAIL

Looking into the kube-system namespace revealed that kube-dns pods were completely missing.

$ KUBECONFIG=assets/kubeawstest3/kubeconfig kubectl get po --all-namespaces
NAMESPACE     NAME                                                                    READY     STATUS    RESTARTS   AGE
kube-system   heapster-v1.3.0-567306696-1qcdm                                         2/2       Running   0          1h
kube-system   kube-apiserver-ip-10-0-0-124.ap-northeast-1.compute.internal            1/1       Running   0          1h
kube-system   kube-controller-manager-ip-10-0-0-124.ap-northeast-1.compute.internal   1/1       Running   0          1h
kube-system   kube-dns-autoscaler-2813114833-ctfsp                                    1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-124.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-135.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-206.ap-northeast-1.compute.internal                1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-36.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-61.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-proxy-ip-10-0-0-78.ap-northeast-1.compute.internal                 1/1       Running   0          1h
kube-system   kube-scheduler-ip-10-0-0-124.ap-northeast-1.compute.internal            1/1       Running   0          1h
kube-system   kubernetes-dashboard-v1.5.1-0q4n9                                       1/1       Running   0          1h

Manually re-running /opt/bin/install-kube-system did fix the problem.

$ FOCUS='.*DNS.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 2 of 474 Specs in 33.560 seconds
SUCCESS! -- 2 Passed | 0 Failed | 0 Pending | 472 Skipped PASS
$ FOCUS='.*Networking.*Internet.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 1 of 474 Specs in 7.361 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 473 Skipped PASS
$ FOCUS='.*Kubectl.*Guestbook.*Conformance.*' ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeawstest3 ./run conformance
...snip...
Ran 1 of 474 Specs in 96.491 seconds
SUCCESS! -- 1 Passed | 0 Failed | 0 Pending | 473 Skipped PASS

How could it happen?
Looking into journalctl -u install-kube-system carefully showed that in the very early stage of the script, the first curl command for installing kube-dns had failed due to the missing kube-system namespace!

https://gist.github.com/mumoshu/e4bfa66bbc6f81ced77a8a4ec1b54fa3#file-install-kube-system-1-boot-log-L130

So let's fix it; should we wait until the kube-system namespace to be available in the beginning of install-kube-system or elsewhere?

@mumoshu mumoshu added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2017
@mumoshu mumoshu added this to the v0.9.6-rc.2 milestone Apr 10, 2017
mumoshu added a commit to mumoshu/kube-aws that referenced this issue Apr 10, 2017
camilb added a commit to camilb/kube-aws that referenced this issue Apr 18, 2017
* kubernetes-incubator/master:
  'Cluster-dump' feature to export Kubernetes Resources to S3
  Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml.
  Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment)
  Wait until kube-system becomes ready Resolves kubernetes-retired#467
  Make the validation error message when KMS failed more friendly Now, when e.g. AWS_ACCESS_KEY_ID is missing, the error looks like: ``` $ kube-aws validate --s3-uri ... *snip* Error: Failed to initialize cluster driver: failed to read/create TLS assets: UnrecognizedClientException: The security token included in the request is invalid. ```
  Fix a race between systemd services: cfn-etc-environment and etcdadm-reconfigure
  Fix API endpoint from HA controllers
  Make AMI fetching even more reliable Resolves kubernetes-retired#474
  etcd unit should unconditionally depend on cfn-etcd-environment
  Rescheduler to 0.3.0 which uses k8s 1.6
  WIP: Bump to Kubernetes v1.6.1 (kubernetes-retired#492)
  Improve auth tokens / TLS bootstrapping UX (kubernetes-retired#489)
  Fix RBAC in Kubernetes 1.6. Fix etcdadm when terminated instances still exist.
  Retry userdata download
  Perform docker post-start check
  Bump to calico 2.1.1
tyrannasaurusbanks pushed a commit to tyrannasaurusbanks/kube-aws that referenced this issue Apr 19, 2017
…update-to-latest-kube-aws-master to hcom-flavour

* commit '175217133f75b3c251536bc0d51ccafd2b1a5de4':
  Fix the dead-lock while bootstrapping etcd cluster when wait signal is enabled. Resolves kubernetes-retired#525
  Fix elasticFileSystemId to be propagated to node pools Resolves kubernetes-retired#487
  'Cluster-dump' feature to export Kubernetes Resources to S3
  Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml.
  Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment)
  Wait until kube-system becomes ready Resolves kubernetes-retired#467
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

1 participant