Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 Add MachinePools to Topology Quickstart E2E Templates #9393

Merged
merged 1 commit into from
Sep 20, 2023

Conversation

willie-yao
Copy link
Contributor

@willie-yao willie-yao commented Sep 11, 2023

What this PR does / why we need it:
This PR adds MachinePools to existing topology templates for the quickstart E2E tests by changing the cluster-with-topology.yaml base, specifically the following templates:

  • cluster-template-topology
  • cluster-template-topology-dualstack-ipv4-primary
  • cluster-template-topology-dualstack-ipv6-primary
  • cluster-template-topology-single-node-cluster

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes part of #5991

/area testing

@k8s-ci-robot k8s-ci-robot added the area/testing Issues or PRs related to testing label Sep 11, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 11, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Sep 11, 2023
@willie-yao willie-yao force-pushed the cc-mp-e2e-quickstart branch 3 times, most recently from ce04f67 to 40f254c Compare September 11, 2023 22:32
@willie-yao
Copy link
Contributor Author

/retest

@sbueringer
Copy link
Member

sbueringer commented Sep 12, 2023

Looking into the test failure:

              error: <*errors.withMessage | 0xc0015f5ea0>{
                  cause: <*errors.errorString | 0xc0018bea10>{
                      s: "wanted [{cluster.x-k8s.io/v1beta1 MachinePool   <nil> <nil>}], actual [{cluster.x-k8s.io/v1beta1 MachinePool quick-start-pysxnr-mp-0-zbnfg f2c874ee-5eb1-4ae5-aad1-a25a366f53a1 0xc00106fce6 0xc00106fce7}]",
                  },
                  msg: "Unexpected ownerReferences for DockerMachinePool/quick-start-pysxnr-mp-0-l22nb",
              },

Based on the YAML, we have the following ownerRef on DockerMachinePool

  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: MachinePool
    name: quick-start-pysxnr-mp-0-zbnfg
    uid: f2c874ee-5eb1-4ae5-aad1-a25a366f53a1

Based on our test code we check for machinePoolOwner

	dockerMachinePoolKind: func(owners []metav1.OwnerReference) error {
		// DockerMachinePool must be owned by a MachinePool.
		return HasExactOwners(owners, machinePoolOwner)
	},

The reason why this fails should be controller: true (basically we expect machinePoolOwner not machinePoolController).

So I think you have to change it to machinePoolController here:

return HasExactOwners(owners, machinePoolOwner)
.

It looks like we already have the same for kubeadmConfig:

	kubeadmConfigKind: func(owners []metav1.OwnerReference) error {
		// The KubeadmConfig must be owned and controlled by a Machine or MachinePool.
		return HasOneOfExactOwners(owners, []metav1.OwnerReference{machineController}, []metav1.OwnerReference{machinePoolController})
	},

Please also change the docs here:

| InfrastructureMachinePool | MachinePool | unknown | Not tested in e2e |

Basically the InfrastructureMachinePool row should be:

| InfrastructureMachinePool     | MachinePool  | yes        |                                             |

Please also update:

| MachinePool                | Cluster            | no         |                          |

| KubeadmConfig         | MachinePool  | yes        | When created for MachinePool.             |

@willie-yao willie-yao force-pushed the cc-mp-e2e-quickstart branch 2 times, most recently from 670d3b9 to 4f00f79 Compare September 12, 2023 18:03
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 12, 2023
@willie-yao willie-yao changed the title 🌱 WIP: Add MachinePools to ClusterClass E2E Templates 🌱 Add MachinePools to Topology Quickstart E2E Templates Sep 12, 2023
Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits

@sbueringer
Copy link
Member

/test pull-cluster-api-e2e-full-main

1 similar comment
@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

Copy link
Contributor

@killianmuldoon killianmuldoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/retest

@sbueringer
Copy link
Member

I think there is something broken (even if not 100% of the time). It looks like some controllers sometimes don't come up on self-hosted clusters

@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member

I think someone has to run the test locally and check why the Pods are not coming up. I think we don't collect that data in Prow (but not entirely sure there)

@willie-yao
Copy link
Contributor Author

I think someone has to run the test locally and check why the Pods are not coming up. I think we don't collect that data in Prow (but not entirely sure there)

Yup I'll try running this locally. I dug through all the prow artifacts and couldn't find anything of note

@sbueringer
Copy link
Member

Could be worth a follow-up issue to explore if we could collect controller logs in this situation. Isn't the first time that this is a bit annoying :)

@willie-yao
Copy link
Contributor Author

willie-yao commented Sep 14, 2023

@sbueringer Looks like the same test passed locally. This does make it tricky to find out why the pods aren't coming up on prow...

With GINKGO_FOCUS="self-hosted"

Ran 4 of 32 Specs in 3327.983 seconds
SUCCESS! -- 4 Passed | 0 Failed | 0 Pending | 28 Skipped
Full build log
```

When testing Cluster API working on self-hosted clusters using ClusterClass [ClusterClass] Should pivot the bootstrap cluster to a self-hosted cluster
/home/willie/go/src/sigs.k8s.io/cluster-api/test/e2e/self_hosted.go:135

Captured StdOut/StdErr Output >>
clusterclass.cluster.x-k8s.io/quick-start created
dockerclustertemplate.infrastructure.cluster.x-k8s.io/quick-start-cluster created
kubeadmcontrolplanetemplate.controlplane.cluster.x-k8s.io/quick-start-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/quick-start-control-plane created
dockermachinetemplate.infrastructure.cluster.x-k8s.io/quick-start-default-worker-machinetemplate created
dockermachinepooltemplate.infrastructure.cluster.x-k8s.io/quick-start-default-worker-machinepooltemplate created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/quick-start-md-default-worker-bootstraptemplate created
kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/quick-start-mp-default-worker-bootstraptemplate created
configmap/cni-self-hosted-lfkh9n-crs-0 created
clusterresourceset.addons.cluster.x-k8s.io/self-hosted-lfkh9n-crs-0 created
cluster.cluster.x-k8s.io/self-hosted-lfkh9n created

I0914 20:28:16.822081 446901 warning_handler.go:65] "KubeAPIWarningLogger: Cluster refers to ClusterClass quick-start but this object which hasn't yet been reconciled. Cluster topology has not been fully validated. "
W0914 20:33:24.475140 446901 reflector.go:458] pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229: watch of *v1.Event ended with: Internal error occurred: etcdserver: no leader
W0914 20:33:35.744043 446901 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229: failed to list *v1.Event: etcdserver: request timed out
E0914 20:33:35.744088 446901 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229: Failed to watch *v1.Event: failed to list *v1.Event: etcdserver: request timed out
W0914 20:33:48.738199 446901 reflector.go:535] pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229: failed to list *v1.Event: etcdserver: request timed out
I0914 20:33:48.738268 446901 trace.go:236] Trace[517708773]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229 (14-Sep-2023 20:33:37.933) (total time: 10804ms):
Trace[517708773]: ---"Objects listed" error:etcdserver: request timed out 10804ms (20:33:48.738)
Trace[517708773]: [10.804841621s] [10.804841621s] END
E0914 20:33:48.738298 446901 reflector.go:147] pkg/mod/k8s.io/client-go@v0.28.1/tools/cache/reflector.go:229: Failed to watch *v1.Event: failed to list *v1.Event: etcdserver: request timed out
I0914 20:41:17.446735 446901 warning_handler.go:65] "KubeAPIWarningLogger: Cluster refers to ClusterClass quick-start but this object which hasn't yet been reconciled. Cluster topology has not been fully validated. "
Failed to get logs for Machine self-hosted-lfkh9n-lb9zt-hzxjf, Cluster self-hosted-pekc0o/self-hosted-lfkh9n: : exited with status: 1, &{%!s(*os.file=&{{{0 0 0} 16 {0} 0 1 true true true} /tmp/self-hosted-lfkh9n-lb9zt-hzxjf877472624 false false false})}
<< Captured StdOut/StdErr Output

Timeline >>
STEP: Creating a namespace for hosting the "self-hosted" test spec @ 09/14/23 20:22:34.376
INFO: Creating namespace self-hosted-pekc0o
INFO: Creating event watcher for namespace "self-hosted-pekc0o"
STEP: Creating a workload cluster @ 09/14/23 20:22:34.44
INFO: Creating the workload cluster with name "self-hosted-lfkh9n" using the "topology" template (Kubernetes v1.27.3, 1 control-plane machines, 1 worker machines)
INFO: Getting the cluster template yaml
INFO: clusterctl config cluster self-hosted-lfkh9n --infrastructure docker --kubernetes-version v1.27.3 --control-plane-machine-count 1 --worker-machine-count 1 --flavor topology
INFO: Creating the workload cluster with name "self-hosted-lfkh9n" form the provided yaml
INFO: Applying the cluster template yaml of cluster self-hosted-pekc0o/self-hosted-lfkh9n
INFO: Waiting for the cluster infrastructure of cluster self-hosted-pekc0o/self-hosted-lfkh9n to be provisioned
STEP: Waiting for cluster to enter the provisioned phase @ 09/14/23 20:22:38.661
INFO: Waiting for control plane of cluster self-hosted-pekc0o/self-hosted-lfkh9n to be initialized
INFO: Waiting for the first control plane machine managed by self-hosted-pekc0o/self-hosted-lfkh9n-lb9zt to be provisioned
STEP: Waiting for one control plane node to exist @ 09/14/23 20:22:48.7
INFO: Waiting for control plane of cluster self-hosted-pekc0o/self-hosted-lfkh9n to be ready
INFO: Waiting for control plane self-hosted-pekc0o/self-hosted-lfkh9n-lb9zt to be ready (implies underlying nodes to be ready as well)
STEP: Waiting for the control plane to be ready @ 09/14/23 20:25:49.425
STEP: Checking all the control plane machines are in the expected failure domains @ 09/14/23 20:25:49.437
INFO: Waiting for the machine deployments of cluster self-hosted-pekc0o/self-hosted-lfkh9n to be provisioned
STEP: Waiting for the workload nodes to exist @ 09/14/23 20:25:49.453
STEP: Checking all the machines controlled by self-hosted-lfkh9n-md-0-p9qp8 are in the "fd4" failure domain @ 09/14/23 20:27:19.646
INFO: Waiting for the machine pools of cluster self-hosted-pekc0o/self-hosted-lfkh9n to be provisioned
STEP: Waiting for the machine pool workload nodes @ 09/14/23 20:27:19.69
STEP: Turning the workload cluster into a management cluster @ 09/14/23 20:27:19.703
STEP: Creating a namespace for hosting the self-hosted test spec @ 09/14/23 20:27:19.737
INFO: Creating namespace self-hosted-pekc0o
INFO: Creating event watcher for namespace "self-hosted-pekc0o"
STEP: Initializing the workload cluster @ 09/14/23 20:27:19.841
INFO: clusterctl init --config /home/willie/go/src/sigs.k8s.io/cluster-api/_artifacts/repository/clusterctl-config.yaml --kubeconfig /tmp/e2e-kubeconfig2007527128 --core cluster-api --bootstrap kubeadm --control-plane kubeadm --infrastructure docker,in-memory --runtime-extension test-extension
INFO: Waiting for provider controllers to be running
STEP: Waiting for deployment capd-system/capd-controller-manager to be available @ 09/14/23 20:27:59.908
INFO: Creating log watcher for controller capd-system/capd-controller-manager, pod capd-controller-manager-84ccc45c4f-ckv8s, container manager
STEP: Waiting for deployment capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager to be available @ 09/14/23 20:28:00.072
INFO: Creating log watcher for controller capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager, pod capi-kubeadm-bootstrap-controller-manager-565dfd9659-xdbbw, container manager
STEP: Waiting for deployment capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager to be available @ 09/14/23 20:28:00.116
INFO: Creating log watcher for controller capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager, pod capi-kubeadm-control-plane-controller-manager-7ff4c556fd-bksbl, container manager
STEP: Waiting for deployment capi-system/capi-controller-manager to be available @ 09/14/23 20:28:00.152
INFO: Creating log watcher for controller capi-system/capi-controller-manager, pod capi-controller-manager-596985d55-xp7rm, container manager
STEP: Waiting for deployment capim-system/capim-controller-manager to be available @ 09/14/23 20:28:00.321
INFO: Creating log watcher for controller capim-system/capim-controller-manager, pod capim-controller-manager-5c8b974bd4-n5mvs, container manager
STEP: Waiting for deployment test-extension-system/test-extension-manager to be available @ 09/14/23 20:28:00.916
INFO: Creating log watcher for controller test-extension-system/test-extension-manager, pod test-extension-manager-6b5dcfff75-55lg4, container manager
STEP: Ensure API servers are stable before doing move @ 09/14/23 20:28:01.517
STEP: Moving the cluster to self hosted @ 09/14/23 20:28:11.534
STEP: Moving workload clusters @ 09/14/23 20:28:11.534
INFO: clusterctl move --from-kubeconfig /tmp/e2e-kind3979012373 --to-kubeconfig /tmp/e2e-kubeconfig2007527128 --namespace self-hosted-pekc0o
INFO: Ensure clusterctl does not take ownership on any fields on the self-hosted cluster
INFO: Waiting for the cluster to be reconciled after moving to self hosted
STEP: Waiting for cluster to enter the provisioned phase @ 09/14/23 20:28:25.859
INFO: Verify there are no unexpected rollouts
INFO: Waiting for control plane to be ready
INFO: Waiting for control plane self-hosted-pekc0o/self-hosted-lfkh9n-lb9zt to be ready (implies underlying nodes to be ready as well)
STEP: Waiting for the control plane to be ready @ 09/14/23 20:31:35.899
STEP: Checking all the control plane machines are in the expected failure domains @ 09/14/23 20:31:35.913
STEP: Upgrading the self-hosted Cluster @ 09/14/23 20:31:35.933
STEP: Upgrading the Cluster topology @ 09/14/23 20:31:35.933
INFO: Patching the new Kubernetes version to Cluster topology
INFO: Waiting for control-plane machines to have the upgraded Kubernetes version
STEP: Ensuring all control-plane machines have upgraded kubernetes version v1.28.0 @ 09/14/23 20:31:36.001
INFO: Error getting pod capim-system/capim-controller-manager-5c8b974bd4-n5mvs, container manager: etcdserver: request timed out
INFO: Error getting pod capi-system/capi-controller-manager-596985d55-xp7rm, container manager: etcdserver: request timed out
INFO: Error getting pod capd-system/capd-controller-manager-84ccc45c4f-ckv8s, container manager: etcdserver: request timed out
INFO: Error getting pod capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager-565dfd9659-xdbbw, container manager: etcdserver: request timed out
INFO: Error getting pod capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager-7ff4c556fd-bksbl, container manager: the server was unable to return a response in the time allotted, but may still be processing the request (get pods capi-kubeadm-control-plane-controller-manager-7ff4c556fd-bksbl)
INFO: Got error while streaming logs for pod test-extension-system/test-extension-manager-6b5dcfff75-55lg4, container manager: http2: server sent GOAWAY and closed the connection; LastStreamID=3007, ErrCode=NO_ERROR, debug=""
INFO: Error starting logs stream for pod test-extension-system/test-extension-manager-6b5dcfff75-55lg4, container manager: Get "https://172.18.0.11:10250/containerLogs/test-extension-system/test-extension-manager-6b5dcfff75-55lg4/manager?follow=true": dial tcp 172.18.0.11:10250: connect: connection refused
INFO: Waiting for kube-proxy to have the upgraded Kubernetes version
STEP: Ensuring kube-proxy has the correct image @ 09/14/23 20:37:18.644
INFO: Creating log watcher for controller capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager, pod capi-kubeadm-bootstrap-controller-manager-565dfd9659-jkbgj, container manager
INFO: Creating log watcher for controller capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager, pod capi-kubeadm-control-plane-controller-manager-7ff4c556fd-5qst2, container manager
INFO: Creating log watcher for controller test-extension-system/test-extension-manager, pod test-extension-manager-6b5dcfff75-nrcf5, container manager
INFO: Waiting for CoreDNS to have the upgraded image tag
STEP: Ensuring CoreDNS has the correct image @ 09/14/23 20:39:49.008
INFO: Waiting for etcd to have the upgraded image tag
INFO: Waiting for Kubernetes versions of machines in MachineDeployment self-hosted-pekc0o/self-hosted-lfkh9n-md-0-p9qp8 to be upgraded to v1.28.0
INFO: Ensuring all MachineDeployment Machines have upgraded kubernetes version v1.28.0
INFO: Creating log watcher for controller capim-system/capim-controller-manager, pod capim-controller-manager-5c8b974bd4-t85k6, container manager
INFO: Creating log watcher for controller capd-system/capd-controller-manager, pod capd-controller-manager-84ccc45c4f-p985c, container manager
INFO: Creating log watcher for controller capi-system/capi-controller-manager, pod capi-controller-manager-596985d55-bf7q6, container manager
STEP: Upgrading the machinepool instances @ 09/14/23 20:40:59.411
INFO: Patching the new Kubernetes version to Machine Pool self-hosted-pekc0o/self-hosted-lfkh9n-mp-0-gwmk7
INFO: Waiting for Kubernetes versions of machines in MachinePool self-hosted-pekc0o/self-hosted-lfkh9n-mp-0-gwmk7 to be upgraded from v1.27.3 to v1.28.0
INFO: Ensuring all MachinePool Instances have upgraded kubernetes version v1.28.0
STEP: Waiting until nodes are ready @ 09/14/23 20:40:59.538
STEP: PASSED! @ 09/14/23 20:40:59.588
INFO: Got error while streaming logs for pod capi-system/capi-controller-manager-596985d55-bf7q6, container manager: context canceled
INFO: Stopped streaming logs for pod capi-system/capi-controller-manager-596985d55-bf7q6, container manager: context canceled
INFO: Got error while streaming logs for pod capd-system/capd-controller-manager-84ccc45c4f-p985c, container manager: context canceled
INFO: Stopped streaming logs for pod capd-system/capd-controller-manager-84ccc45c4f-p985c, container manager: context canceled
INFO: Got error while streaming logs for pod capim-system/capim-controller-manager-5c8b974bd4-t85k6, container manager: context canceled
INFO: Stopped streaming logs for pod capim-system/capim-controller-manager-5c8b974bd4-t85k6, container manager: context canceled
INFO: Got error while streaming logs for pod test-extension-system/test-extension-manager-6b5dcfff75-nrcf5, container manager: context canceled
INFO: Stopped streaming logs for pod test-extension-system/test-extension-manager-6b5dcfff75-nrcf5, container manager: context canceled
INFO: Got error while streaming logs for pod capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager-7ff4c556fd-5qst2, container manager: context canceled
INFO: Stopped streaming logs for pod capi-kubeadm-control-plane-system/capi-kubeadm-control-plane-controller-manager-7ff4c556fd-5qst2, container manager: context canceled
INFO: Got error while streaming logs for pod capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager-565dfd9659-jkbgj, container manager: context canceled
INFO: Stopped streaming logs for pod capi-kubeadm-bootstrap-system/capi-kubeadm-bootstrap-controller-manager-565dfd9659-jkbgj, container manager: context canceled
STEP: Ensure API servers are stable before doing move @ 09/14/23 20:41:00.137
STEP: Moving the cluster back to bootstrap @ 09/14/23 20:41:10.142
STEP: Moving workload clusters @ 09/14/23 20:41:10.142
INFO: clusterctl move --from-kubeconfig /tmp/e2e-kubeconfig2007527128 --to-kubeconfig /tmp/e2e-kind3979012373 --namespace self-hosted-pekc0o
INFO: Waiting for the cluster to be reconciled after moving back to bootstrap
STEP: Waiting for cluster to enter the provisioned phase @ 09/14/23 20:41:33.874
STEP: Dumping logs from the "self-hosted-lfkh9n" workload cluster @ 09/14/23 20:41:43.959
STEP: Dumping all the Cluster API resources in the "self-hosted-pekc0o" namespace @ 09/14/23 20:41:46.522
STEP: Dumping kube-system Pods of Cluster self-hosted-pekc0o/self-hosted-lfkh9n @ 09/14/23 20:41:47.263
<< Timeline

SSSSS

[SynchronizedAfterSuite] PASSED [0.004 seconds]
[SynchronizedAfterSuite]
/home/willie/go/src/sigs.k8s.io/cluster-api/test/e2e/e2e_suite_test.go:176

• [1197.424 seconds]

</details>

@sbueringer
Copy link
Member

Thx!

/lgtm

/test pull-cluster-api-e2e-full-main

Let's confirm by checking the artifacts that DockerMachinePool is created in the self-hosted cased and controllers are coming up

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 82e3d4ce1eaea19d55182748d520aa13ea62dc5b

@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2023

I'll look into the ownerRef thing tomorrow

I introduced this in: #9389

To summarize:

DockerMachinePool
* wanted: cluster.x-k8s.io/v1beta1/MachinePool/true
* actual: cluster.x-k8s.io/v1beta1/MachinePool/true cluster.x-k8s.io/v1beta1/Cluster/false

KubeadmConfig
* wanted: cluster.x-k8s.io/v1beta1/MachinePool/true
* actual: cluster.x-k8s.io/v1beta1/MachinePool/true cluster.x-k8s.io/v1beta1/Cluster/false

I think this can be fixed with

	dockerMachinePoolKind: func(owners []metav1.OwnerReference) error {
		// DockerMachinePool must be owned and controlled by a MachinePool.
		return HasExactOwners(owners, machinePoolController, clusterOwner)
	},
	kubeadmConfigKind: func(owners []metav1.OwnerReference) error {
		// The KubeadmConfig must be owned and controlled by a Machine or MachinePool.
		return HasOneOfExactOwners(owners, []metav1.OwnerReference{machineController}, []metav1.OwnerReference{machinePoolController, clusterOwner})
	},

@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2023

^^ @killianmuldoon please double-check if the test change sounds reasonable based on #9389

cc @willie-yao Feel free to make the test change already, squash the commits and trigger e2e-full again afterwards

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2023
@willie-yao
Copy link
Contributor Author

/test pull-cluster-api-e2e-full-main

@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2023

/lgtm

(approval pending green e2e-full + manual verification via artifacts that the self-hosted case works as expected, i.e. it successfully reconciles a MachinePool)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: c8ce92451ed6f96a697a43f447a0142a9cdd3978

@willie-yao
Copy link
Contributor Author

(approval pending green e2e-full + manual verification via artifacts that the self-hosted case works as expected, i.e. it successfully reconciles a MachinePool)

Looks like the MachinePool was reconciled succesfully according to the status here: https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api/9393/pull-cluster-api-e2e-full-main/1704195435809214464/artifacts/clusters/bootstrap/resources/self-hosted-8h76vm/DockerMachinePool/self-hosted-2mrbjd-mp-0-7q9k6.yaml

For my own understanding: Is there a better way to check for whether it is reconciled successfully or not?

@sbueringer
Copy link
Member

For my own understanding: Is there a better way to check for whether it is reconciled successfully or not?

You could also check the MachinePool and the Cluster (the latter for TopologyReconciled: true)

But ideally this should not be necessary because the e2e test should validate it. But I think we'll have other e2e tests that will verify this explicitly once we adjusted all e2e tests.

Independent of that it would be good to create a follow-up issue to consider adding more validation to our e2e tests. It would be good if we at least validate that TopologyReconciled is true.

@sbueringer
Copy link
Member

@willie-yao Thank you very much!!

Also thx @chrischdi @killianmuldoon for the reviews & troubleshooting. Really appreciate it!

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 20, 2023
@sbueringer sbueringer added area/e2e-testing Issues or PRs related to e2e testing and removed area/testing Issues or PRs related to testing labels Sep 20, 2023
@sbueringer
Copy link
Member

/retest

flake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-testing Issues or PRs related to e2e testing cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants