-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add operator election support #3632
Conversation
Should we change the defaults to increase the number of replicas, or at least change it for e2e tests to make sure this gets used? It doesn't look like we hardcode references to |
I'd also argue this counts as a |
Regarding the defaults in the all-in-one I was wondering the same. I think I would be 👍 to increase it to something like 2 and add a variable in the manifest template. Regarding our e2e tests I was wondering if #2012 would not be a better place to do that because I'm not sure that our current e2e tests will give us any insight from this feature.
If you already tried to increase the number of replicas, wondering why some reconciliations look "weird" ... then I think it looks like an enhancement 😄 . I'm also fine with the |
Co-authored-by: Anya Sabo <anya@sabolicio.us>
👍 and point taken on the e2e testing bit, it would still eventually work without this so we would really only be catching the case where theyre deadlocked. |
I have some questions regarding this change:
I think that if we want to add this (especially as the default), we should have it extensively tested. |
I'm not sure what you mean by "starting a new Pod", if a container inside a Pod is crashing because of a container runtime failure, a config. error or some network issues then the kubelet will try to restart the container, but the Pod itself will stay in a
Not to my knowledge, the idea here is to prevent a user to start simultaneously several operators by increasing the number of replicas. I think it should also cover the case where a Pod is in an
The leader election mechanism is based on the ResourceLock, it already has its own unit tests. The overall mechanism is used by the Kubernetes project itself for the election mechanism of the K8S scheduler or for some built-in controllers. I agree that it should be tested but as mentioned above I think it should be done as part of #2012 There is no communication between the Pods, they are competing to apply an annotation on a apiVersion: v1
kind: ConfigMap
metadata:
annotations:
control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"fcf2a873-8018-43c0-9196-13293a65d9ef","leaseDurationSeconds":15,"acquireTime":"2020-08-19T09:29:49Z","renewTime":"2020-08-19T09:30:15Z","leaderTransitions":0}'
creationTimestamp: "2020-08-19T09:29:49Z"
name: elastic-operator-leader
namespace: elastic-system
resourceVersion: "2331024"
selfLink: /api/v1/namespaces/elastic-system/configmaps/elastic-operator-leader
uid: 9eddb203-16c1-4751-9990-a3f61894cb5c |
I didn't phrase it well, I meant what you described. It all does sound convincing, but it seems to be more a preventive measure for users that added more replicas rather than a reliability improvement, correct? I initially thought the goal is increasing the reliability.
Should we wait with this until #2012 is in then? |
It prevents the operator to start if not elected. By doing that a user can safely run several instances at the same time. It improve the availability, so I would say it is an improvement ?
Current e2e tests will ensure that an election happens in normal situations. As mentioned previously we are relying on a feature of the controller runtime, so my feeling was that it is a nice improvement and there is no urgency to add some additional e2e tests. The only point I'm not sure is that should we keep the default number of replicas to 1 for now, it would make it easier for the user to get the logs. If we want to set the default value to 2 I think it makes sense to also do that for e2e tests as suggested by @anyasabo |
I agree that we should probably keep the number of replicas at 1 but excercise the new functionality in an e2e test. |
CertDir: viper.GetString(operator.WebhookCertDirFlag), | ||
LeaderElection: viper.GetBool(operator.EnableLeaderElection), | ||
LeaderElectionID: LeaderElectionConfigMapName, | ||
LeaderElectionNamespace: operatorNamespace, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is possible to deploy multiple independent instances of the operator, responsible for different namespaces or sets of namespaces via OLM that would all run in the operators
namespace by default IIRC which would then all try to use the same ConfigMap
. Is this possible or are there additional safeguards in place in the leader election algorithm, e.g. some form of unique identifier per operator id. If not, should we try to include the operator UUID as an ID to prevent that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I think it's a good argument for making the leader election config map name configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support multiple operators in the same namespace today, but you're right there's an argument that we should and maybe we should start here. Off the top of my head we need to make this variable:
// licensingCfgMapName is the name of the config map used to store licensing information
licensingCfgMapName = "elastic-licensing"
Thanks for all the inputs ! I have tried to sum my thoughts on the different topics expressed here: Making the leader election config map name configurableIIUC the intention here is to be able to run several operators with distinct responsibilities in a shared namespace.
Regarding OLM: I'm not sure it is supported, or at least it does not seem to be the first intention. The documentation seems to indicate that an operator group is tied to a namespace:
Maybe we should discuss if we want this feature in a dedicated issue ? (Kind of related to #2032) Concurrent executionWhile it is possible to have an algorithm which ensures that there is at most one leader at a given point in time, it is not possible to enforce this condition at the workload level (i.e. threads/go routines may still temporarily live in a process which is not the leader anymore). Even the "leader-for-life" election mechanism has its own edge cases. I would consider that:
TestingI agree that we should test this feature to some extent, but I'm not sure we should expect to have it extensively tested. Election mechanisms are mostly disrupted by slow I/O, high latencies, clock screw [rate] inconsistencies, partition conditions on the host or on the process ... I'm not sure it is easy to simulate those conditions in an e2e environment (at least without adding some noise to our e2e tests suite). Killing Pods randomly would be a first step but my feeling is that it would just be happy path testing. Let's follow up this discussion in #2012 for which I'll try to outline a proposal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once we set back the number of replicas to 1 by default in the manifest.
I think I see this as an additional safeguard in case multiple operators run in parallel, more than a real feature that allows you to achieve HA.
About the points mentioned in the comments:
- 👍 for considering making the config map name configurable in a later stage, along with other hardcoded resource names
- I would vote for not spending too much time writing complicated tests to cover the leader election process, since we just reuse it from client-go and controller-runtime where it is already tested. I think I'm actually OK with no tests at all on our side.
I'm merging this one and will quickly follow up with some e2e tests. |
This PR adds and enables an election mechanism between several running operators. It allows the user to scale up the number of operator instances that are running at the same time.
Fixes #709