Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add operator election support #3632

Merged
merged 8 commits into from
Sep 3, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 29 additions & 11 deletions cmd/manager/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ const (
DefaultWebhookName = "elastic-webhook.k8s.elastic.co"
WebhookPort = 9443

LeaderElectionConfigMapName = "elastic-operator-leader"

debugHTTPShutdownTimeout = 5 * time.Second // time to allow for the debug HTTP server to shutdown
)

Expand Down Expand Up @@ -163,6 +165,11 @@ func Command() *cobra.Command {
false, // Set to false for backward compatibility
"Restrict cross-namespace resource association through RBAC (eg. referencing Elasticsearch from Kibana)",
)
cmd.Flags().Bool(
operator.EnableLeaderElection,
true,
sebgl marked this conversation as resolved.
Show resolved Hide resolved
"Enable leader election. Enabling this will ensure there is only one active operator.",
)
cmd.Flags().Bool(
operator.EnableTracingFlag,
false,
Expand Down Expand Up @@ -359,8 +366,11 @@ func startOperator(stopChan <-chan struct{}) error {

// Create a new Cmd to provide shared dependencies and start components
opts := ctrl.Options{
Scheme: clientgoscheme.Scheme,
CertDir: viper.GetString(operator.WebhookCertDirFlag),
Scheme: clientgoscheme.Scheme,
CertDir: viper.GetString(operator.WebhookCertDirFlag),
LeaderElection: viper.GetBool(operator.EnableLeaderElection),
sebgl marked this conversation as resolved.
Show resolved Hide resolved
LeaderElectionID: LeaderElectionConfigMapName,
LeaderElectionNamespace: operatorNamespace,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to deploy multiple independent instances of the operator, responsible for different namespaces or sets of namespaces via OLM that would all run in the operators namespace by default IIRC which would then all try to use the same ConfigMap. Is this possible or are there additional safeguards in place in the leader election algorithm, e.g. some form of unique identifier per operator id. If not, should we try to include the operator UUID as an ID to prevent that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I think it's a good argument for making the leader election config map name configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we support multiple operators in the same namespace today, but you're right there's an argument that we should and maybe we should start here. Off the top of my head we need to make this variable:

	// licensingCfgMapName is the name of the config map used to store licensing information
	licensingCfgMapName = "elastic-licensing"

}

// configure the manager cache based on the number of managed namespaces
Expand Down Expand Up @@ -459,15 +469,7 @@ func startOperator(stopChan <-chan struct{}) error {
return err
}

// Garbage collect any orphaned user Secrets leftover from deleted resources while the operator was not running.
garbageCollectUsers(cfg, managedNamespaces)

go func() {
time.Sleep(10 * time.Second) // wait some arbitrary time for the manager to start
mgr.GetCache().WaitForCacheSync(nil) // wait until k8s client cache is initialized
r := licensing.NewResourceReporter(mgr.GetClient(), operatorNamespace)
r.Start(licensing.ResourceReporterFrequency)
}()
go asyncTasks(mgr, cfg, managedNamespaces, operatorNamespace)

log.Info("Starting the manager", "uuid", operatorInfo.OperatorUUID,
"namespace", operatorNamespace, "version", operatorInfo.BuildInfo.Version,
Expand All @@ -482,6 +484,22 @@ func startOperator(stopChan <-chan struct{}) error {
return nil
}

// asyncTasks schedules some tasks to be started when this instance of the operator is elected
func asyncTasks(mgr manager.Manager, cfg *rest.Config, managedNamespaces []string, operatorNamespace string) {
<-mgr.Elected() // wait for this operator instance to be elected

// Start the resource reporter
go func() {
time.Sleep(10 * time.Second) // wait some arbitrary time for the manager to start
mgr.GetCache().WaitForCacheSync(nil) // wait until k8s client cache is initialized
r := licensing.NewResourceReporter(mgr.GetClient(), operatorNamespace)
r.Start(licensing.ResourceReporterFrequency)
}()

// Garbage collect any orphaned user Secrets leftover from deleted resources while the operator was not running.
garbageCollectUsers(cfg, managedNamespaces)
}

func registerControllers(mgr manager.Manager, params operator.Parameters, accessReviewer rbac.AccessReviewer) error {
controllers := []struct {
name string
Expand Down
1 change: 1 addition & 0 deletions docs/operating-eck/operator-config.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ ECK can be configured using either command line flags or environment variables.
|debug-http-listen |localhost:6060 |Listen address for the debug HTTP server. Only available in development mode.
|development |false |Enable development mode. Only available as a CLI flag.
|disable-config-watch| false| Watch the configuration file for changes and restart to apply them. Only effective when the `--config` flag is used to set the configuration file.
|enable-leader-election | true | Enable leader election. Must be set to true if using multiple replicas of the operator
|enable-tracing | false | Enable APM tracing in the operator process. Use environment variables to configure APM server URL, credentials, and so on. See link:https://www.elastic.co/guide/en/apm/agent/go/1.x/configuration.html[Apm Go Agent reference] for details.
|enable-webhook | false | Enables a validating webhook server in the operator process.
|enforce-rbac-on-refs| false | Enables restrictions on cross-namespace resource association through RBAC.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ spec:
selector:
matchLabels:
{{- toYaml .Values.operator.selectorLabels | nindent 6 }}
replicas: {{ .Values.operator.replicas }}
serviceName: {{ .Values.operator.name }}
template:
metadata:
Expand Down
4 changes: 3 additions & 1 deletion hack/manifest-gen/assets/charts/eck/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,11 @@ operator:
repository: docker.elastic.co/eck/eck-operator
pullPolicy: IfNotPresent

replicas: 1

resources:
limits:
cpu: 1
cpu: "1"
memory: 512Mi
requests:
cpu: 100m
Expand Down
1 change: 1 addition & 0 deletions pkg/controller/common/operator/flags.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ const (
ContainerRegistryFlag = "container-registry"
DebugHTTPListenFlag = "debug-http-listen"
DisableConfigWatch = "disable-config-watch"
EnableLeaderElection = "enable-leader-election"
EnableTracingFlag = "enable-tracing"
EnableWebhookFlag = "enable-webhook"
EnforceRBACOnRefsFlag = "enforce-rbac-on-refs"
Expand Down