-
Notifications
You must be signed in to change notification settings - Fork 294
Conversation
Codecov Report
@@ Coverage Diff @@
## master #492 +/- ##
==========================================
+ Coverage 38.1% 38.12% +0.02%
==========================================
Files 44 44
Lines 3110 3108 -2
==========================================
Hits 1185 1185
Misses 1729 1729
+ Partials 196 194 -2
Continue to review full report at Codecov.
|
Self-hosted Calico + cni is not working. See #494 |
Even flannel + cni is not working #495 |
This is just the result of running the following: ``` $ contrib/bump-version v1.6.1_coreos.0 Updating contrib/bump-version Updating core/controlplane/config/config.go Updating core/controlplane/config/templates/cluster.yaml Updating e2e/kubernetes/Dockerfile Updating e2e/kubernetes/Makefile Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md $ git checkout -p -- vendor ``` As etcd3 support is already introduced via kubernetes-retired#417, after this change is introduced, it is (ideally) a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1.
Also fix the issue that we end up with the incomplete etcd version number when `etcd.version` is set to just `3` but not e.g. `3.1.5`
Closes kubernetes-retired#495 E2E passed after this change: ``` $ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all *snip* Ran 151 of 588 Specs in 3492.050 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 58m12.359210255s Test Suite Passed 2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s 2017/04/04 09:35:29 e2e.go:80: Done ```
Closes kubernetes-retired#494 Note: without changing the way we run `install-cni` containers of `calico-node` pods, E2E fails due to some pods stall in CrashLoopback: ``` + kubectl get po --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system calico-node-060zz 1/2 CrashLoopBackOff 43 3h kube-system calico-node-bs0b9 1/2 CrashLoopBackOff 43 3h kube-system calico-node-k2h0j 1/2 CrashLoopBackOff 42 3h kube-system calico-node-lt4fd 1/2 CrashLoopBackOff 43 3h kube-system calico-node-sqk63 1/2 CrashLoopBackOff 43 3h kube-system calico-policy-controller-279105993-jkqjh 1/1 Running 18 3h kube-system heapster-v1.3.0-124807402-7mj11 2/2 Running 0 3h kube-system kube-apiserver-ip-10-0-0-53.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-apiserver-ip-10-0-0-96.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-controller-manager-ip-10-0-0-53.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-controller-manager-ip-10-0-0-96.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-dns-782804071-5mb11 2/4 CrashLoopBackOff 110 3h kube-system kube-dns-782804071-xrgdl 2/4 CrashLoopBackOff 110 3h kube-system kube-dns-autoscaler-2813114833-vzk5x 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-190.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-209.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-239.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-247.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-53.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-9.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-proxy-ip-10-0-0-96.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-scheduler-ip-10-0-0-53.ap-northeast-1.compute.internal 1/1 Running 0 3h kube-system kube-scheduler-ip-10-0-0-96.ap-northeast-1.compute.internal 1/1 Running 1 3h kube-system kubernetes-dashboard-v1.5.1-67s47 0/1 CrashLoopBackOff 47 3h ``` calico-node fails due to inability to communiate with etcd: ``` + kubectl logs calico-node-060zz --namespace kube-system calico-node time="2017-04-04T12:12:45Z" level=info msg="Creating Calico client" time="2017-04-04T12:12:45Z" level=info msg="Loading config from environment" time="2017-04-04T12:12:45Z" level=info msg="Ensuring datastore is initialized" time="2017-04-04T12:13:15Z" level=info msg="Unhandled error: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout" time="2017-04-04T12:13:15Z" level=warning msg="Failed to set ready flag" error="client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout" panic: Error initializing datastore: client: endpoint https://ec2-13-112-48-174.ap-northeast-1.compute.amazonaws.com:2379 exceeded header timeout goroutine 1 [running]: panic(0x12fc2e0, 0xc420364260) /usr/local/go/src/runtime/panic.go:500 +0x1a1 main.main() /go/src/github.com/projectcalico/calico-containers/calico_node/startup.go:47 +0x439 Calico node failed to start ```
``` 13s 13s 1 kubelet, ip-10-0-0-224.ap-northeast-1.compute.internal Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-6z186_kube-system\" network: open /etc/kubernetes/cni/net.d/calico-tls/etcd-cert: no such file or directory" ```
This fixes the error below: ``` 13s 12s 2 kubelet, ip-10-0-0-16.ap-northeast-1.compute.internal Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-t46zn_kube-system(37c1a478-19fd-11e7-a57c-061cc33b51ff)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-t46zn_kube-system(37c1a478-19fd-11e7-a57c-061cc33b51ff)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-t46zn_kube-system\" network: Get https://kubeaw6.cwtest.info/api/v1/namespaces/kube-system/pods/kube-dns-3816048056-t46zn: x509: certificate signed by unknown authority" ```
…o-node pod is up This is intended to fix the following error which keeps occuring when we don't manually recreate kube-dns pod: ``` 4m 4s 21 kubelet, ip-10-0-0-238.ap-northeast-1.compute.internal Warning FailedSync Error syncing pod, skipping: failed to "KillPodSandbox" for "db8941be-1a02-11e7-bc1e-06966d4cb749" with KillPodSandboxError: "rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod \"kube-dns-3816048056-mzbsj_kube-system\" network: open /var/lib/cni/flannel/47aca6c4a931562d5b8452ddecbf054721cacf99e25232033db4aacc735baa4c: no such file or directory" ```
Hi @mumoshu I think these changes should also be made to create the minimal roles when RBAC is enabled.
|
model/etcd.go
Outdated
func (e Etcd) Version() EtcdVersion { | ||
if e.Cluster.Version == "3" { | ||
return "3.1.5" | ||
} | ||
if e.Cluster.Version != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why first check for "3"?
if e.Cluster.Version != "" {
return e.Cluster.Version
}
return "3.1.5"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention here is to translate:
etcd:
version: 3
to
etcd:
version: 3.1.5
for users who doesn't care much about minor/patch versions.
However as kube-aws starts defaulting to etcd3 from now on, we can just default to "3.1.5" when it is omitted, as you've suggested 👍
…i v0.5.1 which includes "plugins/meta/flannel: If net config is missing do not return err on DEL" to actually fix the error: ``` 13s 13s 1 kubelet, ip-10-0-0-224.ap-northeast-1.compute.internal Warning FailedSync Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-6z186_kube-system(bf652667-19d6-11e7-b179-06a58bde3385)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-6z186_kube-system\" network: open /etc/kubernetes/cni/net.d/calico-tls/etcd-cert: no such file or directory" ``` https://github.com/projectcalico/cni-plugin/releases/tag/v1.6.2 https://github.com/containernetworking/cni/releases/tag/v0.5.1
``` Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]" ```
According to the review comment kubernetes-retired#492 (review)
E2E is passing against a cluster with/without Calico. |
* kubernetes-incubator/master: 'Cluster-dump' feature to export Kubernetes Resources to S3 Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml. Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment) Wait until kube-system becomes ready Resolves kubernetes-retired#467 Make the validation error message when KMS failed more friendly Now, when e.g. AWS_ACCESS_KEY_ID is missing, the error looks like: ``` $ kube-aws validate --s3-uri ... *snip* Error: Failed to initialize cluster driver: failed to read/create TLS assets: UnrecognizedClientException: The security token included in the request is invalid. ``` Fix a race between systemd services: cfn-etc-environment and etcdadm-reconfigure Fix API endpoint from HA controllers Make AMI fetching even more reliable Resolves kubernetes-retired#474 etcd unit should unconditionally depend on cfn-etcd-environment Rescheduler to 0.3.0 which uses k8s 1.6 WIP: Bump to Kubernetes v1.6.1 (kubernetes-retired#492) Improve auth tokens / TLS bootstrapping UX (kubernetes-retired#489) Fix RBAC in Kubernetes 1.6. Fix etcdadm when terminated instances still exist. Retry userdata download Perform docker post-start check Bump to calico 2.1.1
* Bump to Kubernetes v1.6.1 This change was just the result of running the following commands: ``` $ contrib/bump-version v1.6.1_coreos.0 Updating contrib/bump-version Updating core/controlplane/config/config.go Updating core/controlplane/config/templates/cluster.yaml Updating e2e/kubernetes/Dockerfile Updating e2e/kubernetes/Makefile Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md $ git checkout -p -- vendor ``` As etcd3 support is already introduced via kubernetes-retired#417, after this change is introduced, it was ideally a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1, which turned out not to be true, hence the subsequent changes. * Use etcd3 by default etcd2 support will be dropped soon, as the etcd3 storage driver is already the default since k8s v.1.6.0. * Bump to calico-cni v1.6.2, which is an even newer release than the one included in the latest calico v2.1.2, to deal with kubernetes/kubernetes#43488 * Set up /etc/kubernetes/cni/net.d not using calico-cni but by our own to deal with kubernetes/kubernetes#43014 * Set up /opt/cni/bin using docker rather than a k8s static pod to prevent temporary "failed to find plugin * in path" errors from cni They were emitted when pods are scheduled but /opt/cni/bin is not yet populated ``` Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]" ``` * Fix a bug that resulted etcd-member.service to use the default version number 3.0.x regardless of what is specified via `etcd.version` in cluster.yaml. The bug was reported in kubernetes-retired#497 (comment) * Simplify EtcdVersion func According to the review comment kubernetes-retired#492 (review) * Fix permanent errors like "failed to find plugin * in path" from cni which was breaking cni + flannel/calico in k8s 1.6, by specifying the `--cni-bin-dir=/opt/cni/bin` flag for kubelets The default dir had been accidentally changed at least in k8s 1.6.0 and 1.6.1. Resolves kubernetes-retired#494 Resolves kubernetes-retired#495 E2E against a cluster with flannel passed after this change: ``` $ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all *snip* Ran 151 of 588 Specs in 3492.050 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 58m12.359210255s Test Suite Passed 2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s 2017/04/04 09:35:29 e2e.go:80: Done ``` Also passed against a cluster with calico: ``` Ran 151 of 588 Specs in 3381.108 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 56m21.415087252s Test Suite Passed 2017/04/06 03:58:20 util.go:131: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 56m21.76726736s 2017/04/06 03:58:20 e2e.go:80: Done ```
As etcd3 support is already introduced via #417, it was (ideally) a matter of running
bump-version
and then run E2E tests against a newly created kube-aws cluster with k8s 1.6.1.contrib/bump-version v1.6.1_coreos.0