Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate client key and certificates #1282

Merged
merged 10 commits into from
Jul 21, 2021

Conversation

stlaz
Copy link
Contributor

@stlaz stlaz commented Jul 14, 2021

This is a PR to wire the creation of client certificates for metrics scraping. It uses library-go idiomatic controller logic which is not used yet in CMO.

Kindly asking for feedback. If the general direction is accepted, I will continue with wiring those certificates to Prometheus.

/cc @openshift/openshift-team-monitoring

@openshift-ci openshift-ci bot requested a review from a team July 14, 2021 09:26
@stlaz
Copy link
Contributor Author

stlaz commented Jul 14, 2021

This PR includes some fixes to the original of #1252, namely:

  1. the client CA is now passed in as a config map, there's no need to have it as a secret
  2. the added RBAC now includes permissions to handle events as that's a new thing for the operator
  3. the CSRs now use the generateName instead of a single name to prevent the name of the CSR being taken and thus preventing us from creating a CSR and progressing further. This fix also adds the label required by the CPO to approve the CSR

@stlaz
Copy link
Contributor Author

stlaz commented Jul 14, 2021

looks like the failures in the kube-rbac-proxy are persisting, I'll try to check what's going on

@stlaz
Copy link
Contributor Author

stlaz commented Jul 15, 2021

The problem was in the client-ca being missing for the kube-rbac-proxy-thanos container in the prometheus-k8s workload. Nevertheless, the kube-rbac-proxy should not panic in such cases but that's something that can be investigated outside of this PR.

@stlaz
Copy link
Contributor Author

stlaz commented Jul 15, 2021

Opened brancz/kube-rbac-proxy#131 to track the kube-rbac-proxy issue

@stlaz
Copy link
Contributor Author

stlaz commented Jul 15, 2021

added fixes for other failing kube-rbac-proxies that did not have the client-ca configured

@stlaz
Copy link
Contributor Author

stlaz commented Jul 15, 2021

mhm, the tech debt of running things sequentially is showing as the operator refuses to progress to the client-ca creation while the prometheus-operator fails to deploy because it's missing the client-ca 😐 In an ideal world, this would be handled by two separate controllers.

I'll see if rearranging the order is possible.

@stlaz
Copy link
Contributor Author

stlaz commented Jul 19, 2021

the related test errors from the "common" test runs seem to have passed and the cluster installation is now successful.

I can see that the operator test still fails. Very likely because of the kube-rbac-proxy bug that's prevents our client-cert-authenticated metrics scraping requests from succeeding. That's a shame. I'll try to see if I can fix that.

@simonpasquier
Copy link
Contributor

Yes it looks like the error is now with the kube-rbac-proxy of the prometheus-operator in the openshift-user-workload-monitoring namespace [1].

Maybe you could only enable client TLS authentication for a few targeted service monitors (e.g. node_exporter, kubelet and kube-state-metrics)?

[1] https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-monitoring-operator/1282/pull-ci-openshift-cluster-monitoring-operator-master-e2e-agnostic-operator/1417043735262269440/artifacts/e2e-agnostic-operator/gather-extra/artifacts/pods/openshift-user-workload-monitoring_prometheus-operator-6cc4bfb495-w4w9f_kube-rbac-proxy.log

@stlaz
Copy link
Contributor Author

stlaz commented Jul 19, 2021

I believe I've found the issue, we should be green once brancz/kube-rbac-proxy#132 makes it to the release payload

go.mod Outdated Show resolved Hide resolved
@sthaha
Copy link
Contributor

sthaha commented Jul 20, 2021

Would the PR also include tests?

@stlaz
Copy link
Contributor Author

stlaz commented Jul 20, 2021

I'm using the current e2e tests of this repository (quite successfully, actually, issues were found and are being fixed).

For unit tests, please refer to https://github.com/openshift/library-go/blob/8cc52ce9b72b591c2c1946c81c21f823bbc43a1d/pkg/operator/csr/controller_test.go#L33

@stlaz
Copy link
Contributor Author

stlaz commented Jul 20, 2021

fixed the go.mod in the latest revision

@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2021
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 20, 2021
@simonpasquier
Copy link
Contributor

@sthaha I agree with @stlaz here about tests. The purpose of the change is to switch some service monitors to use TLS client auth (instead of bearer tokens) and the CMO e2e tests already ensure that all monitoring components are scraped successfully so we should be good if all tests pass.

s-urbaniak and others added 2 commits July 20, 2021 14:33
…ble to progress

The prometheus-operator itself is using an kube-rbac-proxy and therefore
needs to have the client-ca for its deployment.
@slaskawi
Copy link

/retest

@slaskawi
Copy link

@simonpasquier Just FYI - @stlaz started his PTO today and I'll be looking at this PR when he's off. Please let me know if we need any additional changes.

@sthaha
Copy link
Contributor

sthaha commented Jul 21, 2021

@sthaha I agree with @stlaz here about tests. The purpose of the change is to switch some service monitors to use TLS client auth (instead of bearer tokens) and the CMO e2e tests already ensure that all monitoring components are scraped successfully so we should be good if all tests pass.

Thank you both 🙇, I was wondering how this change is validated :)

@sthaha
Copy link
Contributor

sthaha commented Jul 21, 2021

mhm, the tech debt of running things sequentially is showing as the operator refuses to progress to the client-ca creation while the prometheus-operator fails to deploy because it's missing the client-ca 😐 In an ideal world, this would be handled by two separate controllers.

@simonpasquier Should this go into our backlog?

@stlaz
Copy link
Contributor Author

stlaz commented Jul 21, 2021

simonpasquier Should this go into our backlog?

you may want to consider library-go for your operator, I trust that the folks of #forum-apiserver would be willing to help, me including :)

Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

The CI is happy with the change so am I :)
Thanks @s-urbaniak @stlaz @slaskawi for the great work!

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jul 21, 2021
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: simonpasquier, stlaz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 21, 2021
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 21, 2021

@stlaz: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws-single-node 104addb link /test e2e-aws-single-node

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

9 similar comments
@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit df26aec into openshift:master Jul 21, 2021
@slaskawi
Copy link

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants