-
Notifications
You must be signed in to change notification settings - Fork 39.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubelet to checkpoint running pods (including kube-apiserver) #30065
Comments
How are the self-hosted components being started currently? Why is there no fallback apiserver that is run as a static pod? Adding checkpointing logic to kubelet that is not specific to kubelet itself doesn't seem like a good idea. cc @kubernetes/sig-node |
cc @aaronlevy How is this scenario handled today with your self-hosting proposal? |
If this were the case, I think it would mean there were some special "fallback" api-servers that would need to be managed/upgraded differently than the self-hosted daemonset/deployment based api-servers. Ideally, this wouldn't be the case as it somewhat defeats a goal of self-hosting these components (back to: modify files on disk with external tools). Now to be fair, single api-server is not exactly the ideal, and we're solving for the failure domain of "all api-servers are down". But single-node master deployments are common, and even addressing multiple api-servers isn't super simple (requires loadbalancer or external DNS). And in the single-node master case, "all api-servers are down" is a reboot - so a goal is finding sane way of solving for this.
Agreed. I think a generic solution would be more ideal, rather than "checkpoint an api-server". The initial implementation of the user-space api-server checkpointing was done this way to scratch the immediate itch, with the longer term goal of starting a discussion around general checkpointing. Which itself is a little vague: pod manifest checkpoints
Functionally, it's pretty close to the workflow described by @maciaszczykm and @floreks. Also, very similar in function to the old podmaster (move static pod manifests around on disk to activate/deactivate)
I may have mis-read (via ctrl-f) but I think the proposal just discusses that etcd is assumed to have survived - that the assumption is that the api-server can always successfully contact etcd. That being said, self-hosting etcd is something we want to look into, but its still in the very early stages of discussion - and like other components, could always be opted out and run however the deployer decides. |
ref: #489 |
Kubelet shouldn't do anything special for apiserver. Kubelet should checkpoint everything it was running. |
Anyway, I think the solution for this more like have a "bootstrap" apiserver in the static manifest folder. |
@vishh we really are just trying to make sure that the self-hosted api-server is up and running at all times, so that kubelet/etcd can communicate with each other and maintain the cluster state. API-server itself being self-hosted needs a way to restart in case of failure. Kubelet not being able to communicate with the etcd server through the api-server prevents this from happening. How the self hosted components are initially started is not a concern of this proposal as the "snapshoting" of the api-server is started after the provisioning process has completed. |
@lavalamp agree, seems we are not respecting the separation of concern here. Having a dedicated pod that is responsible for this would be a better solution? The thing is, the api-server cannot checkpoint itself, so somebody else needs to do it for it.
When taking a snapshot of the running api-server we are persisting important info like etcd server ip address for example. The api server definition can also change over time, or have custom file paths etc... We want to keep this info and use it to restart the api-server when it fails. Storing static pod definitions in the manifest folder goes against the future plans of the community to have k8s's components host them selves, meaning this folder should be kept empty. |
@kenan435 I think this issue would be a lot less confusing if you called it "Kubelet checkpointing", since kubelet is the component that needs to do checkpointing. It needs to do checkpointing for everything, not just kube-apiserver, since other services (like etcd) are also vital for the startup sequence. Focusing specifically on apiserver is overly specific.
IMO, kubelet needs this functionality built in. This is a general problem. We can imagine solving it with a sidecar container that keeps the manifests directory up to date, but that would require privileged mode, so it's not great. Having kubelet checkpoint is a long standing request and is probably captured in another issue already. |
I agree with @lavalamp that we should be thinking about this from the perspective of "Kubelet checkpointing". There is a general existing issue (#489) which covers 'kubelet checkpointing or something'. I think at this point it might help to start the discussion about the scope of work to implement some form of this functionality. But even "checkpointing" is a bit of a grey area in my mind. Regarding solving the problem presented in this issue, the functionality described would be: "periodically save the state of running pods on the node, and be able to recover that state in the absence of an api-server". From a pretty simplistic standpoint, this would also necessitate checkpointing any api-provided assets: Checkpoint podSpecs @vishh I know this has briefly been discussed in the past, but wondering about your thoughts in terms of feasibility / first steps / beginning to scope out this work? |
@hongchaodeng I think you have misinterpreted this issue, it's actually got nothing to do with apiserver. |
@maciaszczykm I have taken the liberty of editing the title and your first heading so as not to mislead people. |
I'm very tempted to close this as a dup of #489. I'll let @dchen1107 make that call. |
I agreed with manys above the initial issue should be resolved by Kubelet checkpoint solution. Kubelet shouldn't tream kube-apiserver and other master component pods different from other pods on the node. Close the issue as the dup of #489. |
This includes the api-server itself I presume. But if the kubelet now holds this so called snapshot and is responsible for restoring the state prior to the failure, what is the point in having this info in the etcd server? Would the pointchecking mechanism hold more detail about the state of the node then etcd does, namely the podSpecs, configMaps and secrets you have outlined above? |
Me and few other Fujitsu employees are interested in mentioned topic and would like to contribute in this area, so we have prepared this proposal to discuss it with whole community. Please share your opinion with us.
CC @kubernetes/sig-cluster-lifecycle @floreks @kenan435 @taimir @zreigz @cheld
Checkpointing the current API server configuration
Currently a single-master cluster installation of Kubernetes that is self-hosted cannot recover after a reboot. The limitation comes from the fact that the API server is self-hosted as well. Since self-hosted components are not static pods, they will not be recreated after the reboot. In order for
kubelet
to restart self-hosted components, it needs a functioning API server. The bootkube project currently resolves this by a dedicated “user space checkpointing” container in the self-hostedkubelet
pod, which periodically persists the API-server manifest as a static pod in the manifest directory.Motivation
It would be of benefit to have the API-server checkpointing as a part of
kubelet
, here’s why:.yaml
files),Proposal
The proposal is to implement a sort of a checkpointing or snapshotting mechanism of the api-server pod definitions. If and when the self hosted api-server fails and is not actively running it would be responsible for spinning up a temp api-server which would in turn start the self-hosted api-server.
High level
General idea of solving this issue is to add API-server checkpointing to the
kubelet
.While running, it will periodically perform backups of the running self-hosted API server and store them locally, in the form of static pod definition. When the self-hosted API-server is down, the locally stored backup can be retrieved as a temporary API server the reason we are spinning up the temp server is to establish communication to the etcd server which has the latest pod definitions so that it can trigger the re-launch of the self-hosted api server in order to heal the cluster. The temporary API server will then be used to recreate all self-hosted components, including the missing self-hosted API-server.
Implementation details
Let’s sketch the rough flow:
kubelet
starts with--checkpoint=true
flag.kube-system
namespace and saves it as a static pod definition locally, as a.yaml
or.json
file.kube-system
API server is not running, it moves saved static pod to the/etc/kubernetes/manifests
directory. Thenkubelet
automatically creates a temporary API server from the static pod.kubelet
is running. Once the checkpointer detectskube-system
API-server is missing, it moves the static pod to the manifests directory and thus the temp API-server is started. Now we havekubelet
and the temporary API server running.etcd
still has the states of all cluster pods from before the cluster restart. Because of that,kubelet
can talk toetcd
via the temporary API server and restore the lost cluster state from before the reboot. This includes starting the true, self-hosted API server./etc/kubernetes/manifests
directory.Issues and limitations
kubelet
after a node reboot.etcd
will restart after the reboot and not be lost, i.e. shouldn’t disappear forever after the master has been rebooted (assuming that it’s located on the master).Where to put checkpointer code in
kubelet
?kubelet
that is on the master node as the only one that is running the checkpointing mechanism?kubelet --checkpoint=true
flag name could be changed.The text was updated successfully, but these errors were encountered: