This repository has been archived by the owner on Sep 30, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 294
etcd version 3 does not send cfn-signal #497
Comments
I'm still investigating this issue. It's weird because it worked once, 3 ETCD nodes, all sent cfn-signal. Even got a snapshot on a S3 bucket . Now I don't manage to pass even one "success".Tomorrow I'm planning to increase the waiting time for the signal and try debugging each failed service on ETCD nodes. |
mumoshu
added a commit
that referenced
this issue
Apr 6, 2017
* Bump to Kubernetes v1.6.1 This change was just the result of running the following commands: ``` $ contrib/bump-version v1.6.1_coreos.0 Updating contrib/bump-version Updating core/controlplane/config/config.go Updating core/controlplane/config/templates/cluster.yaml Updating e2e/kubernetes/Dockerfile Updating e2e/kubernetes/Makefile Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md $ git checkout -p -- vendor ``` As etcd3 support is already introduced via #417, after this change is introduced, it was ideally a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1, which turned out not to be true, hence the subsequent changes. * Use etcd3 by default etcd2 support will be dropped soon, as the etcd3 storage driver is already the default since k8s v.1.6.0. * Bump to calico-cni v1.6.2, which is an even newer release than the one included in the latest calico v2.1.2, to deal with kubernetes/kubernetes#43488 * Set up /etc/kubernetes/cni/net.d not using calico-cni but by our own to deal with kubernetes/kubernetes#43014 * Set up /opt/cni/bin using docker rather than a k8s static pod to prevent temporary "failed to find plugin * in path" errors from cni They were emitted when pods are scheduled but /opt/cni/bin is not yet populated ``` Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]" ``` * Fix a bug that resulted etcd-member.service to use the default version number 3.0.x regardless of what is specified via `etcd.version` in cluster.yaml. The bug was reported in #497 (comment) * Simplify EtcdVersion func According to the review comment #492 (review) * Fix permanent errors like "failed to find plugin * in path" from cni which was breaking cni + flannel/calico in k8s 1.6, by specifying the `--cni-bin-dir=/opt/cni/bin` flag for kubelets The default dir had been accidentally changed at least in k8s 1.6.0 and 1.6.1. Resolves #494 Resolves #495 E2E against a cluster with flannel passed after this change: ``` $ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all *snip* Ran 151 of 588 Specs in 3492.050 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 58m12.359210255s Test Suite Passed 2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s 2017/04/04 09:35:29 e2e.go:80: Done ``` Also passed against a cluster with calico: ``` Ran 151 of 588 Specs in 3381.108 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 56m21.415087252s Test Suite Passed 2017/04/06 03:58:20 util.go:131: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 56m21.76726736s 2017/04/06 03:58:20 e2e.go:80: Done ```
This works now, thanks! |
kylehodgetts
pushed a commit
to HotelsDotCom/kube-aws
that referenced
this issue
Mar 27, 2018
* Bump to Kubernetes v1.6.1 This change was just the result of running the following commands: ``` $ contrib/bump-version v1.6.1_coreos.0 Updating contrib/bump-version Updating core/controlplane/config/config.go Updating core/controlplane/config/templates/cluster.yaml Updating e2e/kubernetes/Dockerfile Updating e2e/kubernetes/Makefile Updating vendor/github.com/aws/aws-sdk-go/CHANGELOG.md $ git checkout -p -- vendor ``` As etcd3 support is already introduced via kubernetes-retired#417, after this change is introduced, it was ideally a matter of running E2E against a newly created kube-aws cluster with k8s 1.6.1, which turned out not to be true, hence the subsequent changes. * Use etcd3 by default etcd2 support will be dropped soon, as the etcd3 storage driver is already the default since k8s v.1.6.0. * Bump to calico-cni v1.6.2, which is an even newer release than the one included in the latest calico v2.1.2, to deal with kubernetes/kubernetes#43488 * Set up /etc/kubernetes/cni/net.d not using calico-cni but by our own to deal with kubernetes/kubernetes#43014 * Set up /opt/cni/bin using docker rather than a k8s static pod to prevent temporary "failed to find plugin * in path" errors from cni They were emitted when pods are scheduled but /opt/cni/bin is not yet populated ``` Error syncing pod, skipping: failed to "CreatePodSandbox" for "kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-3816048056-cwx62_kube-system(12c3204f-1a54-11e7-bfb0-06751e989ae7)\" failed: rpc error: code = 2 desc = NetworkPlugin cni failed to set up pod \"kube-dns-3816048056-cwx62_kube-system\" network: failed to find plugin \"loopback\" in path [/opt/loopback/bin /opt/cni/bin]" ``` * Fix a bug that resulted etcd-member.service to use the default version number 3.0.x regardless of what is specified via `etcd.version` in cluster.yaml. The bug was reported in kubernetes-retired#497 (comment) * Simplify EtcdVersion func According to the review comment kubernetes-retired#492 (review) * Fix permanent errors like "failed to find plugin * in path" from cni which was breaking cni + flannel/calico in k8s 1.6, by specifying the `--cni-bin-dir=/opt/cni/bin` flag for kubelets The default dir had been accidentally changed at least in k8s 1.6.0 and 1.6.1. Resolves kubernetes-retired#494 Resolves kubernetes-retired#495 E2E against a cluster with flannel passed after this change: ``` $ ETCD_VERSION=3 ETCD_SNAPSHOT_AUTOMATED=1 ETCD_DISASTER_RECOVERY_AUTOMATED=1 ETCD_COUNT=3 KUBE_AWS_CLUSTER_NAME=kubeaws2 ./run all *snip* Ran 151 of 588 Specs in 3492.050 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 58m12.359210255s Test Suite Passed 2017/04/04 09:35:29 util.go:127: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 58m12.683100213s 2017/04/04 09:35:29 e2e.go:80: Done ``` Also passed against a cluster with calico: ``` Ran 151 of 588 Specs in 3381.108 seconds SUCCESS! -- 151 Passed | 0 Failed | 0 Pending | 437 Skipped PASS Ginkgo ran 1 suite in 56m21.415087252s Test Suite Passed 2017/04/06 03:58:20 util.go:131: Step './hack/ginkgo-e2e.sh --ginkgo.focus=\[Conformance\]' finished in 56m21.76726736s 2017/04/06 03:58:20 e2e.go:80: Done ```
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
<EDITED, this happens also with etcd2> I tried the new etcd support and specified version: 3.1.5 in the cluster.yaml. However, etcd starts, but is version 3.0.x and also cfn-signal never fires. I attempted to debug why it does not fire since etcd3 was able to startup ok and I'm seeing a weird issue. It does not look like cfn-signal was ever triggered to run and when I manually do systemctl start cfn-signal it hangs forever. When I look at the processlist for some reason it's showing this systemd-tty-ask-password-agent running... Using latest master.
I just tried etcd2 also and it did not receive the signal, however this problem is slightly different since the cfn-signal did startup but cloudformation did not see the signal come in.
The text was updated successfully, but these errors were encountered: