Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🌱 csi: remove cluster-id from config to let csi generate an id instead of hitting length limits #2708

Merged

Conversation

chrischdi
Copy link
Member

@chrischdi chrischdi commented Feb 6, 2024

What this PR does / why we need it:

on our upgrade test pipelines the cluster-id set in the config exceeded a length limit causing CSI to fail.

Example:

{"level":"error","time":"2024-02-06T09:44:49.880325541Z","caller":"service/driver.go:203","msg":"failed to run the driver. Err: +cluster id must not exceed 64 characters","TraceId":"bfe9c60b-8c29-41c3-a951-a60a07046823","stacktrace":"sigs.k8s.io/vsphere-csi-driver/v3/pkg/csi/service.(*vsphereCSIDriver).Run\n\t/build/pkg/csi/service/driver.go:203\nmain.main\n\t/build/cmd/vsphere-csi/main.go:96\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
❯ kubectl get secret -n vmware-system-csi vsphere-config-secret -o yaml | yq '.data[]' | base64 -d
[Global]
cluster-id = "k8s-upgrade-and-conformance-tpan08/k8s-upgrade-and-conformance-7w6is8"

...

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 6, 2024
@chrischdi chrischdi changed the title 🌱 csi: remove cluster-id from config to let csi generate an id instead … 🌱 csi: remove cluster-id from config to let csi generate an id instead of hitting length limits Feb 6, 2024
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 6, 2024
@chrischdi
Copy link
Member Author

/cherry-pick release-1.9

@k8s-infra-cherrypick-robot

@chrischdi: once the present PR merges, I will cherry-pick it on top of release-1.9 in a new PR and assign it to you.

In response to this:

/cherry-pick release-1.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrischdi
Copy link
Member Author

/test help

@k8s-ci-robot
Copy link
Contributor

@chrischdi: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test pull-cluster-api-provider-vsphere-e2e-blocking-main
  • /test pull-cluster-api-provider-vsphere-e2e-conformance-ci-latest-main
  • /test pull-cluster-api-provider-vsphere-e2e-conformance-main
  • /test pull-cluster-api-provider-vsphere-e2e-main
  • /test pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main
  • /test pull-cluster-api-provider-vsphere-test-integration-main
  • /test pull-cluster-api-provider-vsphere-test-main
  • /test pull-cluster-api-provider-vsphere-verify-main

The following commands are available to trigger optional jobs:

  • /test pull-cluster-api-provider-vsphere-apidiff-main

Use /test all to run the following jobs that were automatically triggered:

  • pull-cluster-api-provider-vsphere-apidiff-main
  • pull-cluster-api-provider-vsphere-e2e-blocking-main
  • pull-cluster-api-provider-vsphere-test-integration-main
  • pull-cluster-api-provider-vsphere-test-main
  • pull-cluster-api-provider-vsphere-verify-main

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main

Copy link

codecov bot commented Feb 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (73b3e90) 64.57% compared to head (9abed23) 64.51%.
Report is 16 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2708      +/-   ##
==========================================
- Coverage   64.57%   64.51%   -0.06%     
==========================================
  Files         119      119              
  Lines        8640     8640              
==========================================
- Hits         5579     5574       -5     
- Misses       2628     2632       +4     
- Partials      433      434       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chrischdi
Copy link
Member Author

/assign sbueringer

@sbueringer
Copy link
Member

sbueringer commented Feb 6, 2024

/lgtm
/approve

/hold

In case you want to look into artifacts to confirm that the CSI problem is gone (not sure if this is exposed in artifacts today)

@chrischdi Is it possible today to notice this issue in CI? If yes, how complicated would it be to make the test fail if CSI/CPI/ ... doesn't come up? (I think if CPI doesn't come up nothing works, but could be interesting for CSI)

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Feb 6, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 146f37133cf8f5bd27b199a044f6cabd4c806156

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sbueringer

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 6, 2024
@chrischdi
Copy link
Member Author

/lgtm /approve

/hold

In case you want to look into artifacts to confirm that the CSI problem is gone (not sure if this is exposed in artifacts today)

@chrischdi Is it possible today to notice this issue in CI? If yes, how complicated would it be to make the test fail if CSI/CPI/ ... doesn't come up? (I think if CPI doesn't come up nothing works, but could be interesting for CSI)

I don't see a way currently. I think CSI does not really affect anything.

@sbueringer
Copy link
Member

Would it be possible to check if the CSI pods are up?

Or did they come up regardless of this issue?

@chrischdi
Copy link
Member Author

They were all in CrashLoopBackoff (just noticed it manually).

The issue is we can't check in all tests if they are up. There is e.g. no PostMachinesProvisioned in capi_e2e.K8SConformanceSpecInput.

In capi_e2e.QuickStartSpec we could use PostMachinesProvisioned, but this would not have found this issue.

@sbueringer
Copy link
Member

Maybe it would be good to do it at least in one test (e.g. quickstart). Totally fine to just open a help-wanted issue.

Btw. Is CSI now working on this PR? https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-vsphere/2708/pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main/1754837754383962112/artifacts/clusters/k8s-upgrade-and-conformance-jyo62e/resources/vmware-system-csi/Pod/vsphere-csi-node-66ncd.yaml

Maybe CSI takes a bit to get healthy and the test is just not waiting for it?

There is e.g. no PostMachinesProvisioned in capi_e2e.K8SConformanceSpecInput.

Maybe there's something there for the generic stuff we wanted to add to all tests. E.g. allow some validations before we delete the test clusters in the respective tests (just a general hint for now :)).

@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main

@chrischdi chrischdi force-pushed the pr-remove-cluster-id-csi branch from d0bb2b2 to 9abed23 Compare February 7, 2024 10:20
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2024
@chrischdi
Copy link
Member Author

/test pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main

@k8s-ci-robot k8s-ci-robot removed the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Feb 7, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 7, 2024
@sbueringer
Copy link
Member

/lgtm
/retest

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 7, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 64541b9ef43713594fdfb5ef0ad226bf29f5f2c6

@chrischdi
Copy link
Member Author

chrischdi commented Feb 7, 2024

Maybe it would be good to do it at least in one test (e.g. quickstart). Totally fine to just open a help-wanted issue.

Btw. Is CSI now working on this PR? storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/kubernetes-sigs_cluster-api-provider-vsphere/2708/pull-cluster-api-provider-vsphere-e2e-upgrade-1-29-1-30-main/1754837754383962112/artifacts/clusters/k8s-upgrade-and-conformance-jyo62e/resources/vmware-system-csi/Pod/vsphere-csi-node-66ncd.yaml

Maybe CSI takes a bit to get healthy and the test is just not waiting for it?

There is e.g. no PostMachinesProvisioned in capi_e2e.K8SConformanceSpecInput.

Maybe there's something there for the generic stuff we wanted to add to all tests. E.g. allow some validations before we delete the test clusters in the respective tests (just a general hint for now :)).

I'll defer that to a separate PR :-) (taking it on my list)

@sbueringer
Copy link
Member

Sounds good.

Feel free to merge

/hold
In case you want to look into artifacts to confirm that the CSI problem is gone (not sure if this is exposed in artifacts today)

@chrischdi
Copy link
Member Author

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit 2a66803 into kubernetes-sigs:main Feb 7, 2024
19 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.10 milestone Feb 7, 2024
@k8s-infra-cherrypick-robot

@chrischdi: #2708 failed to apply on top of branch "release-1.9":

Applying: csi: remove cluster-id from config to let csi generate an id instead of hitting length limits
Using index info to reconstruct a base tree...
M	templates/cluster-template-external-loadbalancer.yaml
M	templates/cluster-template-ignition.yaml
M	templates/cluster-template-node-ipam.yaml
M	templates/cluster-template-topology.yaml
M	templates/cluster-template.yaml
A	test/e2e/data/infrastructure-vsphere/v1.9/base/cluster-template.yaml
A	test/e2e/data/infrastructure-vsphere/v1.9/commons/cluster-resource-set-csi-insecure.yaml
A	test/e2e/data/infrastructure-vsphere/v1.9/topology/cluster-template-topology.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/e2e/data/infrastructure-vsphere/v1.9/base/cluster-template.yaml deleted in HEAD and modified in csi: remove cluster-id from config to let csi generate an id instead of hitting length limits. Version csi: remove cluster-id from config to let csi generate an id instead of hitting length limits of test/e2e/data/infrastructure-vsphere/v1.9/base/cluster-template.yaml left in tree.
Auto-merging test/e2e/data/infrastructure-vsphere/v1.7/commons/cluster-resource-set-csi-insecure.yaml
Auto-merging test/e2e/data/infrastructure-vsphere/v1.7/base/cluster-template-topology.yaml
CONFLICT (content): Merge conflict in test/e2e/data/infrastructure-vsphere/v1.7/base/cluster-template-topology.yaml
Auto-merging templates/cluster-template.yaml
Auto-merging templates/cluster-template-topology.yaml
Auto-merging templates/cluster-template-node-ipam.yaml
Auto-merging templates/cluster-template-ignition.yaml
Auto-merging templates/cluster-template-external-loadbalancer.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 csi: remove cluster-id from config to let csi generate an id instead of hitting length limits
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-1.9

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants