Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Pebble checks on workers when TLS is enabled #66

Merged
merged 5 commits into from
Sep 4, 2024
Merged

Conversation

michaeldmitry
Copy link
Contributor

@michaeldmitry michaeldmitry commented Sep 2, 2024

Issue

With the current, worker pebble checks, if TLS is enabled, the worker unit would be stuck in BlockedStatus(node down) with errors in workload logs indicating that certificate is valid for [coordinator hostname], not for [worker hostname]. The reason is we're passing the worker, a cert requested by the coordinator with sans that don't contain workers' FQDNs.

Solution

  • Pass the integrated workers' hostnames (FQDNs) to the cert's SANs and refresh the CSR on cluster_relation_changed_event. This will try to add/remove the worker's SAN when its added/removed to the cluster.
  • run update-ca-certificates to workloads to trust the CA.

Context

Addition of refresh_events to CertHandler library canonical/observability-libs#108

Testing Instructions

Deploy S3, SSC

curl https://raw.githubusercontent.com/canonical/tempo-coordinator-k8s-operator/main/scripts/deploy_minio.py | MINIO_MODEL=test python3
juju deploy self-signed-certificates ssc --channel edge --trust

Pack & deploy Tempo coordinator and with this cos-lib

Pack & deploy Tempo worker, with this cos-lib and pass socket.getfqdn() to readiness_check_endpoint

Integrate

jhack imtarix fill

Verify

  • Wait until units settle and in active/idle state
  • Inspect logs of worker to make sure there are no certificate errors

Scale

juju add-unit tempo-worker

Verify

  • Wait until all units settle and in active/idle state
  • Inspect logs of both workers to make sure there are no certificate errors

@michaeldmitry michaeldmitry requested a review from a team as a code owner September 2, 2024 11:01
Copy link
Contributor

@PietroPasotti PietroPasotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good job finding out the issue in the first place!

@PietroPasotti
Copy link
Contributor

unittests are fixed, but they depend on canonical/observability-libs#108

@michaeldmitry michaeldmitry merged commit e95bb88 into main Sep 4, 2024
5 checks passed
@michaeldmitry michaeldmitry deleted the fix-tls branch September 4, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants