Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow injecting labels onto the resources created by the elastic-operator. #1353

Closed
r0fls opened this issue Jul 23, 2019 · 15 comments
Closed
Labels
>enhancement Enhancement of existing functionality

Comments

@r0fls
Copy link

r0fls commented Jul 23, 2019

Proposal

Allow injecting labels onto the resources created by the elastic-operator.

Use case. Why is this important?
This is important for environments that enforce labelling requirements, for instance for cost reporting or alert routing.

For example, this should work to inject the foo: bar label to the resources that the operator creates:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
  labels:
    foo: bar
spec:
  version: 7.1.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
@r0fls
Copy link
Author

r0fls commented Jul 23, 2019

Actually it looks like this may be supported via the CRD podTemplate configuration value

@r0fls
Copy link
Author

r0fls commented Jul 24, 2019

I have now filled in the podTemplate I believe correctly based on the docs:

However the operator still fails to start with label requirements. I receive the following message:

{"level":"error","ts":1563939770.8546793,"logger":"manager","msg":"unable to run the manager","error":"admission webhook "validating-webhook.openpolicyagent.org" denied the request: missing tag app in metadata.labels","stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:232\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.glob..func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:56\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201"}

@r0fls
Copy link
Author

r0fls commented Jul 24, 2019

I also see:

for: "/tmp/tmp_mqft68y": Internal error occurred: failed calling admission webhook "validation.elasticsearch.elastic.co": Post https://elastic-webhook-service.elastic-system.svc:443/validate-elasticsearches?timeout=30s: service "elastic-webhook-service" not found

So I'm guessing the operator fails to create the admission webhook pod, since it doesn't have the required labels. Is there anyway to specify additional labels for the admission webhook?

@thbkrkr
Copy link
Contributor

thbkrkr commented Jul 24, 2019

Actually it looks like this may be supported via the CRD podTemplate configuration value

Correct, it will be supported in the next release 0.9.

  spec:
    podTemplate:
      metadata:
        labels:
          # additional labels for pods
          foo: bar

for: "/tmp/tmp_mqft68y": Internal error occurred: failed calling admission webhook "validation.elasticsearch.elastic.co": Post https://elastic-webhook-service.elastic-system.svc:443/validate-elasticsearches?timeout=30s: service "elastic-webhook-service" not found```

It might look like this known issue, if you use EKS: #896 (comment).

Which ECK version do you use?
Which Kubernetes version and distribution do you use?
Can you share the resource definition you used to try to reproduce the issue?

@thbkrkr thbkrkr added the >enhancement Enhancement of existing functionality label Jul 24, 2019
@r0fls
Copy link
Author

r0fls commented Jul 24, 2019

ECK version 0.8.1
Kubernetes version: 1.12.8

The operator manifest is here: https://download.elastic.co/downloads/eck/0.8.1/all-in-one.yaml

The cluster manifest I tried is:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.1.0
  nodes:
  - nodeCount: 1
    podTemplate:
      meta:
        labels:
          app: "elk-cluster"
    config:
      node.master: true
      node.data: true
      node.ingest: true

Bear in mind this may work if you do not have an admission controller preventing resources from being created that are missing the app label.

@anyasabo
Copy link
Contributor

@r0fls if I'm understanding you correctly, the manifest of your ES spec doesn't look like it's your only problem at least. Am I understanding correctly that the operator pod is failing to start because the validating-webhook.openpolicyagent.org webhook is blocking the operator pod from being created? If so, you can slightly modify the all in one template yaml you linked such that the stateful set template.metadata.labels section includes an app label, so that any pods it creates will have that label. That should let the operator pods pass the webhook and start successfully.

What isn't clear is why the ECK validating webhook was able to be created at all -- normally the operator creates the validating webhook when it starts, but in this case it cannot start. I wonder if the operator was deployed first, and then the validating-webhook.openpolicyagent.org was deployed later on?

Once the operator pods are running and you are past the validating-webhook.openpolicyagent.org error, the validation.elasticsearch.elastic.co error that you are seeing should go away as well since the operator (that services that webhook) is up and running. If it is not, please let us know and we can attempt to troubleshoot further. As @thbkrkr mentioned there may be other cluster configuration settings that block communication from the API server to the operator pod.

And for what it's worth, in the v0.9.0 release the validation.elasticsearch.elastic.co webhook should cause a little bit less trouble as we've made some changes to its configuration.

@r0fls
Copy link
Author

r0fls commented Jul 24, 2019

I should have been more clear, I am already adding the pod labels to the statefulset. I can see the operator pod starts (it wouldn't even be created if it was missing the labels) but then it starts crash looping with the following error:

{"level":"error","ts":1564004240.214247,"logger":"manager","msg":"unable to run the manager","error":"admission webhook "validating-webhook.openpolicyagent.org" denied the request: missing tag app in metadata.labels","stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:232\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.glob..func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:56\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201"}

@r0fls
Copy link
Author

r0fls commented Jul 24, 2019

I think it's failing to create the webhook service because it doesn't have the labels needed for our environment.

@anyasabo
Copy link
Contributor

Ah understood. We'll take a look and see what we can do to make the error logging a little bit better there and any other improvements for this situation. In the meantime, the workaround suggested here:
#896 (comment)
of disabling the webhook creation by changing the stateful set spec to --operator-roles=global,namespace should allow it to start up.

@r0fls
Copy link
Author

r0fls commented Jul 25, 2019

That does prevent the operator pod from crashlooping, but I still see:

2019-07-25 10:20:11 raphael-Latitude-7490 K8DeployerSingle[21241] ERROR Error running: kubectl --namespace elastic-system apply --record -f /tmp/tmp31cxu1kb (Return Code: 1)
Output:
Error from server (InternalError): error when creating "/tmp/tmp31cxu1kb": Internal error occurred: failed calling admission webhook "validation.elasticsearch.elastic.co": Post https://elastic-webhook-service.elastic-system.svc:443/validate-elasticsearches?timeout=30s: dial tcp 100.67.219.213:443: connect: connection refused

When I try to deploy the ELK cluster.

@anyasabo
Copy link
Contributor

@r0fls if you remove that webhook configuration you should be good (for example kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io validating-webhook-configuration).

It was created by the operator before you changed the operator config to not serve up webhooks. So right now there is a clusterwide admission webhook configuration telling it to use the https://elastic-webhook-service.elastic-system.svc endpoint, but we did not configure the operator to function in that capacity.

@r0fls
Copy link
Author

r0fls commented Jul 25, 2019

Sweet, that got rid of the error! Thanks!

@r0fls
Copy link
Author

r0fls commented Jul 25, 2019

Any idea when the 0.9.0 release will be available with support for the podTemplate value? Will that also work for the kibana instances?

@r0fls
Copy link
Author

r0fls commented Jul 26, 2019 via email

@pebrc
Copy link
Collaborator

pebrc commented Jun 29, 2020

Closing. Not an issue/fixed in 0.9.0 if I read the discussion correctly

@pebrc pebrc closed this as completed Jun 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality
Projects
None yet
Development

No branches or pull requests

4 participants