Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote CA cert is created empty when remote clusters are not configured for an ES cluster #3881

Closed
Navbar opened this issue Oct 28, 2020 · 8 comments · Fixed by #3993
Closed
Assignees
Labels
>bug Something isn't working

Comments

@Navbar
Copy link

Navbar commented Oct 28, 2020

Upgrade ECK 1.0.1 operator to 1.2.1 over K8s ended with empty transport ca.crt.

I would like to check the ca rotation feature before expiration date - new parameters added:
- --ca-cert-validity=80h
- --ca-cert-rotate-before=2h
- --cert-validity=80h
- --cert-rotate-before=2h

How to reproduce:

  1. Install ECK 1.0.1 (as quickstart) - certificate valid for 1y, possible to query _ssl/certificate over API
  2. Upgrade Operator to 1.2.1 (apply all 1.2.1quickstart yaml files)

Logs show:
"message": "path: /_ssl/certificates, params: {}", "cluster.uuid": "TZIF4kdNTo2cadQReCI_yw", "node.id": "a63gze16SX21ZrqI0DVSbw" ,
"stacktrace": ["java.security.cert.CertificateException: failed to parse any certificates from [/usr/share/elasticsearch/config/transport-remote-certs/ca.crt]",

curl -u "elastic:password" -s -k "https://quickstart-es-http:9200/_ssl/certificates"|jq .
{
"error": {
"root_cause": [
{
"type": "certificate_exception",
"reason": "failed to parse any certificates from [/usr/share/elasticsearch/config/transport-remote-certs/ca.crt]"
}
],
"type": "certificate_exception",
"reason": "failed to parse any certificates from [/usr/share/elasticsearch/config/transport-remote-certs/ca.crt]"
},
"status": 500
}

empty cert file:
[root@quickstat-es-master-nodes-0 transport-remote-certs]# cat ca.crt
[root@quickstart-es-master-nodes-0 transport-remote-certs]# pwd
/usr/share/elasticsearch/config/transport-remote-certs

Am i doing something wrong?

@botelastic botelastic bot added the triage label Oct 28, 2020
@sebgl
Copy link
Contributor

sebgl commented Oct 28, 2020

Can you provide the ECK operator logs? I'm wondering if anything particular is logged about those certs.
Please also provide your Elasticsearch manifest.

@Navbar
Copy link
Author

Navbar commented Oct 28, 2020

kubectl logs elastic-operator-0 -nelastic-system|grep cert
{"log.level":"info","@timestamp":"2020-10-27T12:03:49.152Z","log.logger":"manager","message":"Automatic management of the webhook certificates enabled","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:49.161Z","log.logger":"manager","message":"Polling for the webhook certificate to be available","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","path":"/tmp/k8s-webhook-server/serving-certs/tls.crt"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:49.903Z","log.logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:49.904Z","log.logger":"controller-runtime.controller","message":"Starting EventSource","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","controller":"webhook-certificates-controller","source":"kind source: /, Kind="}
{"log.level":"info","@timestamp":"2020-10-27T12:03:49.913Z","log.logger":"controller-runtime.certwatcher","message":"Starting certificate watcher","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.004Z","log.logger":"controller-runtime.controller","message":"Starting EventSource","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","controller":"webhook-certificates-controller","source":"kind source: /, Kind="}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.105Z","log.logger":"controller-runtime.controller","message":"Starting Controller","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","controller":"webhook-certificates-controller"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.205Z","log.logger":"controller-runtime.controller","message":"Starting workers","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","controller":"webhook-certificates-controller","worker count":1}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.205Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":1,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.225Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":1,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert","took":0.020049627}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.225Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":2,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.244Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":2,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co","took":0.018437817}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.442Z","log.logger":"transport","message":"Issuing new certificate","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","pod_name":"quickstart-es-master-nodes-1"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.448Z","log.logger":"transport","message":"Issuing new certificate","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","pod_name":"quickstart-es-data-nodes-1"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.453Z","log.logger":"transport","message":"Issuing new certificate","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","pod_name":"quickstart-es-master-nodes-0"}
{"log.level":"info","@timestamp":"2020-10-27T12:03:50.459Z","log.logger":"transport","message":"Issuing new certificate","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","pod_name":"quickstart-es-data-nodes-0"}
{"log.level":"info","@timestamp":"2020-10-27T22:03:50.225Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":3,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert"}
{"log.level":"info","@timestamp":"2020-10-27T22:03:50.235Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":3,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert","took":0.009279857}
{"log.level":"info","@timestamp":"2020-10-27T22:03:50.244Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":4,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co"}
{"log.level":"info","@timestamp":"2020-10-27T22:03:50.255Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":4,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co","took":0.010822867}
{"log.level":"info","@timestamp":"2020-10-28T06:55:15.408Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":5,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co"}
{"log.level":"info","@timestamp":"2020-10-28T06:55:15.418Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":5,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co","took":0.009791256}
{"log.level":"info","@timestamp":"2020-10-28T08:03:50.235Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":6,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert"}
{"log.level":"info","@timestamp":"2020-10-28T08:03:50.245Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":6,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert","took":0.010126062}
{"log.level":"info","@timestamp":"2020-10-28T08:03:50.255Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":7,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co"}
{"log.level":"info","@timestamp":"2020-10-28T08:03:50.280Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":7,"namespace":"","validating_webhook_configuration":"elastic-webhook.k8s.elastic.co","took":0.025248856}
{"log.level":"info","@timestamp":"2020-10-28T08:28:57.054Z","log.logger":"webhook-certificates-controller","message":"Starting reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":8,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert"}
{"log.level":"info","@timestamp":"2020-10-28T08:28:57.110Z","log.logger":"webhook-certificates-controller","message":"Ending reconciliation run","service.version":"1.2.1-b5316231","service.type":"eck","ecs.version":"1.4.0","iteration":8,"namespace":"elastic-system","validating_webhook_configuration":"elastic-webhook-server-cert","took":0.055848646}

@Navbar
Copy link
Author

Navbar commented Oct 28, 2020

@sebgl - what exactly do you mean by Elasticsearch manifest?

@david-kow david-kow added the >bug Something isn't working label Oct 30, 2020
@botelastic botelastic bot removed the triage label Oct 30, 2020
@david-kow
Copy link
Contributor

Hey @Navbar, by manifest @sebgl meant the Elasticsearch resource manifest, like a file or script you've used to create the Elasticsearch in K8s. You can also grab it with kubectl get -o yaml es quickstart (assuming quickstart is the name of the your ES resource).

@Navbar
Copy link
Author

Navbar commented Nov 1, 2020

Ok. here it is:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  annotations:
    common.k8s.elastic.co/controller-version: 1.2.1
    elasticsearch.k8s.elastic.co/cluster-uuid: TZIF4kdNTo2cadQReCI_yw
  creationTimestamp: "2020-10-27T11:28:17Z"
  generation: 5
  name: quickstart
  namespace: default
  resourceVersion: "9385298"
  selfLink: /apis/elasticsearch.k8s.elastic.co/v1/namespaces/default/elasticsearches/quickstart
  uid: 7da1e900-b135-4429-8ab9-45f2d3995a42
spec:
  auth: {}
  http:
    service:
      metadata:
        creationTimestamp: null
      spec: {}
    tls:
      certificate: {}
  nodeSets:
  - config:
      node.data: false
      node.ingest: false
      node.master: true
      node.store.allow_mmap: false
    count: 2
    name: master-nodes
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers: null
    volumeClaimTemplates:
    - metadata:
        creationTimestamp: null
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 10Gi
        storageClassName: managed-premium
      status: {}
  - config:
      node.data: true
      node.ingest: true
      node.master: false
      node.store.allow_mmap: false
    count: 2
    name: data-nodes
    podTemplate:
      metadata:
        creationTimestamp: null
      spec:
        containers:
        - env:
          - name: ES_JAVA_OPTS
            value: -Xms4g -Xmx4g
          name: elasticsearch
          resources:
            limits:
              cpu: "2"
              memory: 8Gi
            requests:
              cpu: 500m
              memory: 4Gi
    volumeClaimTemplates:
    - metadata:
        creationTimestamp: null
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 660Gi
        storageClassName: managed-premium
      status: {}
  transport:
    service:
      metadata:
        creationTimestamp: null
      spec: {}
  updateStrategy:
    changeBudget: {}
  version: 7.6.0
status:
  availableNodes: 4
  health: green
  phase: Ready

@Navbar
Copy link
Author

Navbar commented Nov 8, 2020

Any update on this matter?

@pebrc pebrc added the triage label Nov 24, 2020
@botelastic botelastic bot removed the triage label Nov 24, 2020
@pebrc pebrc removed the >bug Something isn't working label Nov 24, 2020
@botelastic botelastic bot added the triage label Nov 24, 2020
@idanmo idanmo added the >bug Something isn't working label Nov 25, 2020
@botelastic botelastic bot removed the triage label Nov 25, 2020
@idanmo idanmo changed the title upgrade eck 1.0.1 to 1.2.1 ended with empty transport ca.crt Remote CA cert is created empty when remote clusters are not configured for an ES cluster Nov 25, 2020
@idanmo
Copy link
Collaborator

idanmo commented Nov 25, 2020

Thanks for reporting this issue @Navbar!
This bug was introduced in ECK 1.1.0, related to the remote clusters feature. The remote clusters reconciliation logic creates an empty ca.crt file in /usr/share/elasticsearch/config/transport-remote-certs/ca.crt if remote clusters are not configured for the ES cluster. The empty file, as reported, causes the _ssl/certificates REST API call to fail.
So far I couldn't think of a workaround. I'll update if anything changes.
The issue will be fixed in a future release.

@idanmo
Copy link
Collaborator

idanmo commented Nov 26, 2020

As for fixing this issue, I see 3 options.

If remote clusters are not set for an Elasticsearch cluster, instead of creating an empty transport-remote-certs/ca.crt file:

  1. Populate the file with the same cert specified in the *-transport-ca-internal secret (need to verify that this is not causing a cert trust issue)
  2. Populate the file with a dummy cert
  3. Do not create the transport-remote-certs/ca.crt file and do not add it to elasticsearch.yml (this will cause ES to restart whenever remote clusters are configured after the initial ES deployment and does not preserve the current way the feature works where restarting is not required setting remote clusters)

While (3) seems like the cleanest solution, we probably want to avoid that restart and go with either (1) or (2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants