-
Notifications
You must be signed in to change notification settings - Fork 294
deadlock when enabled cfn-signal on etcd nodes? #525
Comments
@redbaron It assign a EIP to your first node? Had something similar today, but it was because I forgot to add some parameters to VPC config. Do get this error in journald?
|
Good catch! It does support cfn-signal. etcdadm-reconfigure.service sets
etcd-member's Type to simple or notify accordingly to the number of
remaining nodes to be set up.
If it fails to signal, maybe theres a bug in etcdadm
2017年4月11日(火) 7:18 Camil Blanaru <notifications@github.com>:
… @redbaron <https://github.com/redbaron> It assign a EIP to your first
node? Had something similar today, but it was because I forgot to add some
parameters to VPC config. Do get this error in journald?
"Apr 10 13:05:30 ip-10-0-1-108.ec2.internal bash[1059]: run: discovery failed"
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#525 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AABV-YwDkUki-bB5KBz_Z6ZBVz69uI54ks5ruqq7gaJpZM4M5XbH>
.
|
@mumoshu , |
@redbaron As you've seen |
@mumoshu , hmm, true.I'll investigate further |
@mumoshu , it sees that there is this line https://github.com/kubernetes-incubator/kube-aws/blob/dd345a7bacf74076883f4f8c21df90d18e6e1ab9/core/controlplane/config/templates/cloud-config-etcd#L154 which makes |
@redbaron Wow!!! You are correct. It must be |
when wait signal is enabled. Resolves kubernetes-retired#525
@redbaron It is now fixed in master! Would you mind confirming if it works for you? |
Perhaps I couldn't notice the issue due to my insufficient test setup, which contained only one etcd node. To reproduce the issue, I believe I needed at least 3 or more, odd number of etcd nodes. |
…update-to-latest-kube-aws-master to hcom-flavour * commit '175217133f75b3c251536bc0d51ccafd2b1a5de4': Fix the dead-lock while bootstrapping etcd cluster when wait signal is enabled. Resolves kubernetes-retired#525 Fix elasticFileSystemId to be propagated to node pools Resolves kubernetes-retired#487 'Cluster-dump' feature to export Kubernetes Resources to S3 Follow-up for the multi API endpoints support This fixes the issue which prevented a k8s cluster from being properly configured when multiple API endpoints are defined in cluster.yaml. Fix incorrect validations on apiEndpoints Ref kubernetes-retired#520 (comment) Wait until kube-system becomes ready Resolves kubernetes-retired#467
* kubernetes-incubator/master: Don't mount /var/lib/rkt into kubelet to avoid shared bind-mounts propagation Fix to calico configuration file etcd endpoints Fix hyperlink to restore script in Readme.md. Reference 'autosave' rather than 'export' in comments of cluster.yaml. 'Restore' feature to restore Kubernetes Resources from S3 backup Add missing '/' when constructing the Autosave S3 put path Shared Persistent Volume (kubernetes-retired#471) Fix an incorrect variable name in the e2e/run script Add documentation for administrating etcd cluster Resolves kubernetes-retired#491 use gzip base64 encoding for customFiles New options: customFiles and customSystemdUnits Add cluster.yaml details for apiEndpointName Fix the dead-lock while bootstrapping etcd cluster when wait signal is enabled. Resolves kubernetes-retired#525 Fix elasticFileSystemId to be propagated to node pools Resolves kubernetes-retired#487 Minor fixup for etcd unit files Fix up apiEndpoints.loadBalancer config
when wait signal is enabled. Resolves kubernetes-retired#525
It sees that there is no support of cfn-signal on etcd nodes or maybe I am doing something wrong?
deadlock goes like following:
etcd-member
is of typenotify
which is then fired when etcd server joins the clustercfn-signal
waits foretcd-member
to becomeis-active
which never happens until it reports readiness to systemdetcd-member
can't join cluster as it first one and seems to wait for others to pop up, but it can't happen because cloudformation wont start next etcd until first one reports successThe text was updated successfully, but these errors were encountered: