Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][DoNotReview]✨ clusterctl: verify cert-manager did inject the CA certificates into the objects before proceeding #10433

Closed

Conversation

chrischdi
Copy link
Member

What this PR does / why we need it:

Ensures that we do not fail the e2e tests for intermittent x509 errors.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Part of #9688

/area e2e-testing

@k8s-ci-robot k8s-ci-robot added area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 12, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Apr 12, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chrischdi. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@chrischdi chrischdi changed the title 🌱 test: do not fail on intermittent cert-manager errors when checking for rollouts [wip] 🌱 test: do not fail on intermittent cert-manager errors when checking for rollouts Apr 12, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2024
@chrischdi
Copy link
Member Author

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-dualstack-and-ipv6-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-29-1-30-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

4 similar comments
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 15, 2024
@chrischdi
Copy link
Member Author

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-build-main
  • /test pull-cluster-api-e2e-blocking-main
  • /test pull-cluster-api-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-e2e-conformance-main
  • /test pull-cluster-api-e2e-dualstack-and-ipv6-main
  • /test pull-cluster-api-e2e-main
  • /test pull-cluster-api-e2e-mink8s-main
  • /test pull-cluster-api-e2e-upgrade-1-29-1-30-main
  • /test pull-cluster-api-test-main
  • /test pull-cluster-api-test-mink8s-main
  • /test pull-cluster-api-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-apidiff-main
  • pull-cluster-api-build-main
  • pull-cluster-api-e2e-blocking-main
  • pull-cluster-api-test-main
  • pull-cluster-api-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main

@chrischdi chrischdi force-pushed the pr-e2e-clusterctl-upgrade-x509 branch from 50b8cc5 to 501da90 Compare April 15, 2024 16:19
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main

@chrischdi chrischdi force-pushed the pr-e2e-clusterctl-upgrade-x509 branch from 501da90 to d577747 Compare April 15, 2024 16:20
@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-mink8s-main

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@chrischdi
Copy link
Member Author

chrischdi commented Apr 15, 2024

Unrelated flake:

No Control Plane machines came into existence.

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/10433/pull-cluster-api-e2e-mink8s-main/1779937398726070272

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@chrischdi
Copy link
Member Author

Unrelated flake:

Timed out waiting for 3 ready replicas for MachinePool machine-pool-svfnio/machine-pool-ppat2y-mp-0

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/10433/pull-cluster-api-e2e-dualstack-and-ipv6-main/1779987181905907712

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@chrischdi chrischdi force-pushed the pr-e2e-clusterctl-upgrade-x509 branch from d577747 to 64f6eab Compare April 16, 2024 07:37
@chrischdi chrischdi force-pushed the pr-e2e-clusterctl-upgrade-x509 branch from dcca4c9 to 1b0ac8b Compare April 16, 2024 07:52
@chrischdi
Copy link
Member Author

and some logging
/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@chrischdi chrischdi force-pushed the pr-e2e-clusterctl-upgrade-x509 branch from 1b0ac8b to 9d0017e Compare April 16, 2024 08:17
@chrischdi
Copy link
Member Author

Improved duration printing

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@chrischdi chrischdi changed the title [wip] 🌱 test: do not fail on intermittent cert-manager errors when checking for rollouts [wip] :feature clusterctl: do not fail on intermittent cert-manager errors when checking for rollouts Apr 16, 2024
@chrischdi chrischdi changed the title [wip] :feature clusterctl: do not fail on intermittent cert-manager errors when checking for rollouts ✨ clusterctl: verify cert-manager did inject the CA certificates into the objects before proceeding Apr 16, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 16, 2024
@sbueringer
Copy link
Member

I would suggest to wait with merging until after the releases today

@chrischdi
Copy link
Member Author

I would suggest to wait with merging until after the releases today

kind of ack, there won't be any cherry-picking until the releases are out, merging to main is okay but it needs reviews anyway :-)

@chrischdi
Copy link
Member Author

/assign @sbueringer

:-)

@chrischdi
Copy link
Member Author

/cherry-pick release-1.7

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.7 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.7

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrischdi
Copy link
Member Author

/cherry-pick release-1.6

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.6 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

for _, obj := range components.Objs() {
obj := obj
if value, ok := obj.GetAnnotations()[certManagerCAAnnotation]; ok {
_ = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? :)

secretObjKey := client.ObjectKey{Namespace: certificate.GetNamespace(), Name: secretName}
certificateSecret := &corev1.Secret{}
if err := c.Client.Get(ctx, secretObjKey, certificateSecret); err != nil {
return "", errors.Wrapf(err, "getting secret %s", &certificateObjKey)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be secretObjKey? (or maybe mention both)

return client.ObjectKey{Namespace: nameStr[:splitPoint], Name: nameStr[splitPoint+1:]}
}

// checkObject gets the desired CA certificate and compares it relevant field on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// checkObject gets the desired CA certificate and compares it relevant field on
// checkObject gets the desired CA certificate and compares it with the relevant field on

return err
}

if crd.Spec.Conversion.Webhook == nil || crd.Spec.Conversion.Webhook.ClientConfig == nil || string(crd.Spec.Conversion.Webhook.ClientConfig.CABundle) != ca {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead skip the validation if Conversion.Webhook.ClientConfig doesn't exist?

It's a strange config for a CRD to have the annotation and no client config, but I think it's not on us to fail in that case.

I think if the conversion webhook is configured and the caBundle supposed to be used ClientConfig has to be configured before the cert-manger becomes active

}
}
default:
return fmt.Errorf("unknown object type %s", obj.GetKind())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we shouldn't error in this case. This way we stick to best-effort validation vs failing in whatever cases folks might have (e.g. we implemented something very similar for ExtensionConfig, although we picked a different name for the annotation)

cmd/clusterctl/client/cluster/ca_injection.go Show resolved Hide resolved
@@ -103,7 +103,11 @@ func (i *providerInstaller) Install(ctx context.Context, opts InstallOptions) ([
ret = append(ret, components)
}

return ret, waitForProvidersReady(ctx, opts, i.installQueue, i.proxy)
if err := waitForProvidersReady(ctx, opts, i.installQueue, i.proxy); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We intentionally added a flag to make wait for providers opt-in. It feels wrong to add waitForCAInjection for all cases.

Does it make sense to waitForCAInjection when waitForProviders is disabled? (this is the default)

Maybe waitForCAInjection should simplyi be part of waitForProvidersReady

If not, we problably would want to add another flag and disable per default (we can still had-code it to true in the tests / test framework)

@@ -192,3 +193,8 @@ func IsDeploymentWithManager(obj unstructured.Unstructured) bool {
}
return false
}

// IsCertificate return true if obj is a certificate.
func IsCertificate(obj unstructured.Unstructured) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellij says unused

@@ -116,9 +115,6 @@ func AssertOwnerReferences(namespace, kubeconfigPath string, assertFuncs ...map[
// Sometimes the conversion-webhooks are not ready yet / cert-managers ca-injector
// may not yet have injected the new ca bundle after the upgrade.
// If this is the case we return an error to retry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also drop the corresponding comment

@chrischdi
Copy link
Member Author

/tide merge-method-squash

@chrischdi
Copy link
Member Author

/test pull-cluster-api-e2e-main
/test pull-cluster-api-e2e-dualstack-and-ipv6-main
/test pull-cluster-api-e2e-mink8s-main

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-e2e-dualstack-and-ipv6-main cc1e563 link true /test pull-cluster-api-e2e-dualstack-and-ipv6-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@chrischdi
Copy link
Member Author

/hold

This seems to be broken now.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2024
@sbueringer
Copy link
Member

Discussed with Christian. We have an alternative idea on how to fix this. Please don't review the current PR for now

@chrischdi chrischdi changed the title ✨ clusterctl: verify cert-manager did inject the CA certificates into the objects before proceeding [WIP][DoNotReview]✨ clusterctl: verify cert-manager did inject the CA certificates into the objects before proceeding Apr 19, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 19, 2024
@chrischdi
Copy link
Member Author

/close

@k8s-ci-robot
Copy link
Contributor

@chrischdi: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants