Timeout when validating admission webhook unreachable #896

msvechla · 2019-05-21T07:28:47Z

Bug Report

What did you do?

Follow your Quickstart guide: https://www.elastic.co/guide/en/cloud-on-k8s/current/index.html

During the second step when applying the Elasticsearch resource definition, a timeout occurs and the resource is never created:

Error from server (Timeout): error when creating "STDIN": Timeout: request did not complete within requested timeout 30s

What did you expect to see?

Creation of the Elasticsearch resource

What did you see instead? Under which circumstances?

Timeout, always.

Environment

Version information:

eck 0.8.0

Kubernetes information:

Running on EKS

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.6", GitCommit:"ab91afd7062d4240e95e51ac00a18bd58fddd365", GitTreeState:"clean", BuildDate:"2019-02-26T12:59:46Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12+", GitVersion:"v1.12.6-eks-d69f1b", GitCommit:"d69f1bf3669bf00b7f4a758e978e0e7a1e3a68f7", GitTreeState:"clean", BuildDate:"2019-02-28T20:26:10Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Resource definition:

apiVersion: elasticsearch.k8s.elastic.co/v1alpha1
kind: Elasticsearch
metadata:
  name: quickstart
spec:
  version: 7.1.0
  nodes:
  - nodeCount: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true

Logs:
No relevant logs in operator, last message was:

{"level":"info","ts":1558423031.9141233,"logger":"kubebuilder.webhook","msg":"starting the webhook server."}

The text was updated successfully, but these errors were encountered:

barkbay · 2019-05-21T11:09:58Z

Hi,

I would like to be sure that the problem does not come from the validation webhook which is supposed to do some sanity check on the request.

Please could you:

Backup the configuration of the validation webhook:

$ kubectl get ValidatingWebhookConfiguration -o yaml > ValidatingWebhookConfiguration.yaml

Delete the ValdiationWebhook:

$ kubectl delete ValidatingWebhookConfiguration validating-webhook-configuration

And then try again ?

Thank you

msvechla · 2019-05-21T12:32:40Z

After deleting it, the resource was created successfully. Here is the yaml, maybe it can help with debugging:

apiVersion: v1
items:
- apiVersion: admissionregistration.k8s.io/v1beta1
  kind: ValidatingWebhookConfiguration
  metadata:
    creationTimestamp: 2019-05-21T06:29:29Z
    generation: 1
    name: validating-webhook-configuration
    resourceVersion: "416167"
    selfLink: /apis/admissionregistration.k8s.io/v1beta1/validatingwebhookconfigurations/validating-webhook-configuration
    uid: c9db79f4-7b91-11e9-8da9-0271e0db6b8e
  webhooks:
  - clientConfig:
      caBundle: SENSITIVE
      service:
        name: elastic-webhook-service
        namespace: elastic-system
        path: /validate-elasticsearches
    failurePolicy: Fail
    name: validation.elasticsearch.elastic.co
    namespaceSelector:
      matchExpressions:
      - key: control-plane
        operator: DoesNotExist
    rules:
    - apiGroups:
      - elasticsearch.k8s.elastic.co
      apiVersions:
      - v1alpha1
      operations:
      - CREATE
      - UPDATE
      resources:
      - elasticsearches
    sideEffects: Unknown
  - clientConfig:
      caBundle: SENSITIVE
      service:
        name: elastic-webhook-service
        namespace: elastic-system
        path: /validate-enterpriselicenses
    failurePolicy: Fail
    name: validation.license.elastic.co
    namespaceSelector:
      matchExpressions:
      - key: control-plane
        operator: DoesNotExist
    rules:
    - apiGroups:
      - elasticsearch.k8s.elastic.co
      apiVersions:
      - v1alpha1
      operations:
      - CREATE
      - UPDATE
      resources:
      - enterpriselicenses
    sideEffects: Unknown
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

msvechla · 2019-05-21T21:09:08Z

Fyi I just tried this again on a new cluster, same issues. Here are the startup logs:

"level":"info","ts":1558472733.2504811,"logger":"manager","msg":"Starting the Cmd."}
{"level":"info","ts":1558472733.3509061,"logger":"kubebuilder.webhook","msg":"installing webhook configuration in cluster"}
{"level":"info","ts":1558472733.351051,"logger":"kubebuilder.admission.cert.writer","msg":"cert is invalid or expiring, regenerating a new one"}
{"level":"info","ts":1558472733.3607376,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"elasticsearch-controller"}
{"level":"info","ts":1558472733.3608296,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"apmserver-controller"}
{"level":"info","ts":1558472733.3608878,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"apm-es-association-controller"}
{"level":"info","ts":1558472733.3609493,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibana-association-controller"}
{"level":"info","ts":1558472733.3610291,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"license-controller"}
{"level":"info","ts":1558472733.3611038,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"kibana-controller"}
{"level":"info","ts":1558472733.3611677,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"trial-controller"}
{"level":"info","ts":1558472733.3612182,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"remotecluster-controller"}
{"level":"info","ts":1558472733.4609866,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"apmserver-controller","worker count":1}
{"level":"info","ts":1558472733.4612098,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-controller","worker count":1}
{"level":"info","ts":1558472733.4613671,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"elasticsearch-controller","worker count":1}
{"level":"info","ts":1558472733.4614935,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"apm-es-association-controller","worker count":1}
{"level":"info","ts":1558472733.4616113,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"kibana-association-controller","worker count":1}
{"level":"info","ts":1558472733.4617238,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"license-controller","worker count":1}
{"level":"info","ts":1558472733.4618585,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"remotecluster-controller","worker count":1}
{"level":"info","ts":1558472733.4619713,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"trial-controller","worker count":1}
{"level":"info","ts":1558472733.9756606,"logger":"kubebuilder.webhook","msg":"starting the webhook server."}
{"level":"error","ts":1558472733.9759276,"logger":"kubebuilder.webhook","msg":"server returns an unexpected error","error":"open /tmp/cert/cert.pem: no such file or directory","stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/webhook.(*Server).run\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:261\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/webhook.(*Server).Start\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/webhook/server.go:216\ngithub.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/manager.(*controllerManager).start.func2\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/sigs.k8s.io/controller-runtime/pkg/manager/internal.go:257"}
{"level":"error","ts":1558472733.9761424,"logger":"manager","msg":"unable to run the manager","error":"open /tmp/cert/cert.pem: no such file or directory","stacktrace":"github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:232\ngithub.com/elastic/cloud-on-k8s/operators/cmd/manager.glob..func1\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/manager/main.go:56\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:766\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:852\ngithub.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/vendor/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/elastic/cloud-on-k8s/operators/cmd/main.go:28\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201"}

Deleting the validating-webhook-configuration solved the issue again

barkbay · 2019-05-22T11:46:39Z

Thanks for the update.

I'm investigating an issue that seems to be specific to the Amazon environment.
In the meantime be aware that the webhook is (re)created automatically when the operator is (re)started.

pebrc · 2019-05-22T12:08:51Z

The webhook can be disabled permanently by specifying --operator-roles=global,namespace in the operator stateful set spec instead of all

thomasriley · 2019-06-18T20:32:12Z

Also seen this issue running in GKE. Removing the ValidatingWebhookConfiguration allowed me to apply the demo Elasticsearch yaml.

Happy to help if you need any diagnostics.

Replacing ```all``` by ```global,namespace``` in the operator stateful set spec solves the issue elastic#896

barkbay · 2019-07-01T11:25:05Z

Sorry for this late answer.

I did some successful tests on Amazon EKS.
My guess would be that a rule is missing in the security group of your nodes. Please could you check that the rule that allows some communication from the control plane to the port 443 on the nodes is present:

It is described as a "Recommended inbound traffic" by Amazon and it could cause this kind of issue by preventing communication from the control plane to the HTTPS server which implements the Validating Webhook.

MiLk · 2019-07-05T04:58:04Z

I had the same issue, and opening the port 443 from the control plane to the worker nodes solved it.

EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with elastic#896.

EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with #896.

thbkrkr · 2019-07-12T13:55:02Z

Closing this as a solution has been identified. Please reopen if needed.

* Support for APM server configuration (#1181) * Add a config section to the APM server configuration * APM: Add support for keystore * Factorize ElasticsearchAuthSettings * Update dev setup doc + fix GKE bootstrap script (#1203) * Update dev setup doc + fix GKE bootstrap script * Update wording of container registry authentication * Ensure disks removal after removing cluster in GKE (#1163) * Update gke-cluster.sh * Implement cleanup for unused disks in GCP * Update Makefile * Update CI jobs to do proper cleanup * Normalize the raw config when creating canonical configs (#1208) This aims at counteracting the difference between JSON centric serialization and the use of YAML as the serialization format in canonical config. If not normalizing numeric values like 1 will differ when comparing configs as JSON deserializes integer numbers to float64 and YAML to uint64. * Homogenize logs (#1168) * Don't run tests if only docs are changed (#1216) * Update Jenkinsfile * Simplify notOnlyDocs() * Update Jenkinsfile * Push snapshot ECK release on successful PR build (#1184) * Update makefile's to support snapshots * Add snapshot releases to Jenkins pipelines * Cleanup * Rename RELEASE to USE_ELASTIC_DOCKER_REGISTRY * Update Jenkinsfile * Add a note on EKS inbound traffic & validating webhook (#1211) EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with #896. * Update PodSpec with Hostname from PVC when re-using (#1204) * Bind the Debug HTTP server to localhost by default (#1220) * Run e2e tests against custom Docker image (#1135) * Add implementation * Update makefile's * Update Makefile * Rename Jenkisnfile * Fix review comments * Update e2e-custom.yml * Update e2e-custom.yml * Return deploy-all-in-one to normal * Delete GKE cluster only if changes not in docs (#1223) * Add operator version to resources (#1224) * Warn if unsupported distribution (#1228) The operator only works with the official ES distributions to enable the security available with the basic (free), gold and platinum licenses in order to ensure that all clusters launched are secured by default. A check is done in the prepare-fs script by looking at the existence of the Elastic License. If not present, the script exit with a custom exit code. Then the ES reconcilation loop sends an event of type warning if it detects that a prepare-fs init container terminated with this exit code. * Document Elasticsearch update strategy change budget & groups (#1210) Add documentation for the `updateStrategy` section of the Elasticsearch spec. It documents how (and why) `changeBudget` and `groups` are used by ECK, and how both settings can be specified by the user.

tadgh · 2019-07-17T16:16:44Z

Just chiming in to say I experienced this only when attempting to add automated snapshots. When attempting to add the secureSettings to elasticsearch. The error was:

Internal error occurred: failed calling admission webhook "validation.elasticsearch.elastic.co": Post https://elastic-webhook-service.elastic-system.svc:443/validate-elasticsearches?timeout=30s: service "elastic-webhook-service" not found

Following the above advice of deleting the validating-webhook-configuration and trying again worked for me. Leaving this here as I'm pretty sure its the same issue.

PaulGrandperrin · 2019-07-31T12:52:26Z

Hi, I saw that this issue was referenced in the documentation and I just want to dump how and why I got it and how we solved it to help other people.

On elastic-operator 0.8.X, everything was working fine
When upgrading to 0.9.0, all resource creation (kubectl apply -f quickstart.yaml) timed out without any helpful message.
After deep digging into many comments from many old issues, I try deleting the validatingwebhookconfigurations (BTW, this name feels like a generic name but I suspect it is in fact very specific the elastic-operator, am I right?)
Then everything works perfectly (without resource validation of course) but this is just a hack, not the solution.
After digging deeper I came to understand why it was working before 0.9.0:
- we are using private clusters on GKE, which means the Kubernetes masters cannot communicate with any port on any pod. By default, only the ports 443 and 10250 (kubelet) are opened from the masters to the pods.
- in 0.8.X, the elastic webhook was listening on the 443 port but this was changed to 9443 in 0.9.0 to avoid using the cap_net_bind_service capability: 7d778e8
- The 9443 port is not whitelisted by default so this commit broke all GKE installation on private clusters
We added a firewall rule to allow traffic on 9443, here our terraform snippet:

// Certmanager deployment (webhook access)
resource "google_compute_firewall" "elastic_operator_webhook_ingress_cluster_2" {
  name      = "${var.project_suffix}-elastic-operator-webhook-ingress-cluster-2"
  network   = google_compute_network.net.name
  direction = "INGRESS"

  allow {
    protocol = "tcp"
    ports    = ["9443"]
  }

  source_ranges = [
    var.k8s_cluster_2_master_ipv4_cidr_block,
  ]

  target_tags = [
    "${var.k8s_cluster_2_name}-node",
  ]
}

You can find more info here:

trotro added a commit to trotro/cloud-on-k8s that referenced this issue Jun 21, 2019

Update operator.template.yaml

94bd4da

Replacing ```all``` by ```global,namespace``` in the operator stateful set spec solves the issue elastic#896

trotro mentioned this issue Jun 21, 2019

Update operator.template.yaml #1130

Closed

sebgl mentioned this issue Jul 8, 2019

Add a note on EKS inbound traffic & validating webhook #1211

Merged

thbkrkr closed this as completed Jul 12, 2019

thbkrkr mentioned this issue Jul 24, 2019

Allow injecting labels onto the resources created by the elastic-operator. #1353

Closed

thbkrkr changed the title ~~timeout when applying elasticsearch resource~~ Timeout when validating admission webhook unreachable Jul 31, 2019

barkbay mentioned this issue Sep 2, 2019

Webhook name #1675

Closed

brunowego mentioned this issue Sep 3, 2019

installing sdep straight after operator seems not fully reliable SeldonIO/seldon-core#669

Closed

Dev25 mentioned this issue Mar 11, 2020

Add optional firewall support terraform-google-modules/terraform-google-kubernetes-engine#452

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout when validating admission webhook unreachable #896

Timeout when validating admission webhook unreachable #896

msvechla commented May 21, 2019

barkbay commented May 21, 2019

msvechla commented May 21, 2019

msvechla commented May 21, 2019

barkbay commented May 22, 2019

pebrc commented May 22, 2019

thomasriley commented Jun 18, 2019

barkbay commented Jul 1, 2019

MiLk commented Jul 5, 2019

thbkrkr commented Jul 12, 2019

tadgh commented Jul 17, 2019

PaulGrandperrin commented Jul 31, 2019 •

edited

Loading

Timeout when validating admission webhook unreachable #896

Timeout when validating admission webhook unreachable #896

Comments

msvechla commented May 21, 2019

Bug Report

barkbay commented May 21, 2019

msvechla commented May 21, 2019

msvechla commented May 21, 2019

barkbay commented May 22, 2019

pebrc commented May 22, 2019

thomasriley commented Jun 18, 2019

barkbay commented Jul 1, 2019

MiLk commented Jul 5, 2019

thbkrkr commented Jul 12, 2019

tadgh commented Jul 17, 2019

PaulGrandperrin commented Jul 31, 2019 • edited Loading

PaulGrandperrin commented Jul 31, 2019 •

edited

Loading