-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding a solution for etcd #277
Comments
My proposed solution is the etcd-operator, since it provides nearly all criteria out of the box:
I've already integrated kubeadm and etcd-operator successfully in this PR, and here is the fork. I think it's probably worthwhile to come up with a more granular disaster recovery requirement list, and also think about to what degree an etcd solution should cover all the bases. We already have several issues where this is tracked, so we should take into account all the suggested solutions there too: |
|
@jamiehannaford if there is a branch you would like reviewed I'd be happy to go through it now. |
@xiang90 Awesome. So if a failed upgrade occurs, the user can manual restore from a backup file. Is there a way that etcd can automatically check specific locations (like local FS, S3) for backups without the user needing to specify one manually? For example, assume that the etcd-operator has been backing stuff up to a S3 container. When it initialises it checks the same bucket and boots from there (this assume the user hasn't changed any backup options). |
@timothysc Thanks! The only branch I have is the one I submitted in my PR. I think you've already gone through this though. Unless you meant something else? There are a bunch of comments on that PR which I can start to address as a next step forward. I think I'll also add TLS secrets to the PR too. Should I go ahead and do that? |
@jamiehannaford Feel free to. I'm gonna try to look at the TLS Secrets PR this week so it might yet change (@andrewrynhard), but I don't expect it to be part of v1.7 to give us a little more time to think about it until v1.8 |
I'd just like to highlight that doing something like this automatically is a terrible idea and will give you multiple sources of truth if etcd is internally partitioned. I suspect recovery will need to be manually triggered, because by definition it is required when the etcd cluster is incapable of making robust automatic decisions. |
totally agree. we designed this to be a manual work, at least at etcd operator side. |
Moving milestone to v1.9. In v1.8, we're gonna stick with a local etcd instance listening on localhost. |
Automatic merge from submit-queue (batch tested with PRs 54593, 54607, 54539, 54105). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add HA feature gate and minVersion validation **What this PR does / why we need it**: As we add more feature gates, there might be occasions where a feature is only available on newer releases of K8s. If a user makes a mistake, we should notify them as soon as possible in the init procedure and not them go down the path of hard-to-debug component issues. Specifically with HA, we ideally need the new `TaintNodesByCondition` (added in v1.8.0 but working in v1.9.0). **Which issue this PR fixes:** kubernetes/kubeadm#261 kubernetes/kubeadm#277 **Release note**: ```release-note Feature gates now check minimum versions ``` /cc @kubernetes/sig-cluster-lifecycle-pr-reviews @luxas @timothysc
Automatic merge from submit-queue (batch tested with PRs 49840, 54937, 54543). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Add self-hosted etcd API to kubeadm **What this PR does / why we need it**: This PR is part of a larger set that implements self-hosted etcd. This PR takes a first step by adding: 1. new API types in `cmd/kubeadm/app/apis` for configuring self-hosted etcd 2. new Go types in `cmd/kubeadm/app/phases/etcd/spec` used for constructing EtcdCluster CRDs for the etcd-operator. The reason we define these in trunk is because kubeadm cannot import `github.com/coreos/etcd-operator` as a dependency until it's in its own repo. Until then, we need to redefine the structs in our codebase. **Which issue this PR fixes**: kubernetes/kubeadm#261 kubernetes/kubeadm#277 **Special notes for your reviewer**: This is the first step PR in order to save reviewers from a goliath PR **Release note**: ```release-note NONE ```
Moving milestone for this to v1.10 as we depend on changes being made to the operator before we can use it and the code freeze is coming up. |
Given all the history here and recent feedback, we need to go with the non-operator option. |
So I'm going to close this issue and open a new one to outline the doc on using the existing commands to lay down etcd, we will likely have to wait until we have some of the other phases work done as well. |
A few weeks a go some folks on Slack brought up the idea of defining requirements for highly available etcd on kubeadm-provisioned clusters. The idea was to define requirements before the implementation of any solution. This discussion was continued in the sig-cluster-lifecycle meeting on May 16th 2017, where we came up with some initial critieria:
a. Recovers from member failure
b. Recovers from quorum loss
c. Recovery from full cluster failure i.e. power-off
d. Recovers from partial / failed / interrupted upgrades
a. TLS encryption
b. Certificate rotation
a. Non-self hosted
b. Self-hosted (optional)
a. Rolling upgrades
b. Downgrades (but tricky because of etcd)
Are there any I've missed?
The next stage is proposing solutions that meet the above criteria and can be verified in a fork.
cc/ @timothysc @justinsb @philips @xiang90 @aaronlevy
The text was updated successfully, but these errors were encountered: