Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix volume provisioning when duplicate region or zone tags exist #2814

Merged
merged 2 commits into from
Apr 8, 2024

Conversation

dobsonj
Copy link
Member

@dobsonj dobsonj commented Mar 7, 2024

What this PR does / why we need it:

This PR allows volume provisioning to work in two scenarios:

A) Both the datacenter and the cluster have a (duplicate) region tag

datacenter1: region-A
  cluster1: region-A, zone-C
    host1: 

B) Both the cluster and the host have a (duplicate) zone tag

datacenter1: region-A
  cluster1: zone-C
    host1: zone-C

In both cases, the volume fails to be provisioned with:

Warning ProvisioningFailed 7m42s (x9 over 12m) csi.vsphere.vmware.com_vmware-vsphere-csi-driver-controller-f9bf8757c-7mhdl_4c338b41-cb91-400f-8034-eb1cf1452e34 failed to provision volume with StorageClass "thin-csi": rpc error: code = Internal desc = failed to create volume. Errors encountered: [No compatible datastores found for accessibility requirements [map[topology.csi.vmware.com/openshift-region:region-A topology.csi.vmware.com/openshift-zone:zone-C]] pertaining to vCenter ...

Although this cluster is tagged incorrectly, this scenario did work on previous versions of the driver, which means upgrading the driver could break volume provisioning on existing clusters where this previously worked.

I propose a simple solution: when analyzing the tags, prefer the entity highest in the hierarchy and ignore the duplicate tag lower in the hierarchy. If the datacenter has a region tag, that takes precedent over any cluster underneath it. Likewise, if the cluster has a zone tag, that takes precedent over any host in that cluster.

This can be done with two changes:

  1. Remove duplicate hosts in GetHostsForSegment

findCommonHostsforAllTopologyKeys only returns the host reference if the host is found exactly once in each segment. If the tag exists twice in the hierarchy, it fails and returns an empty slice (no common hosts) even though there is a common host in both region-A and zone-C. See this backtrace as an example. If we remove duplicate host entries before finding common hosts, it correctly returns the only common host for the region and zone tags.

  1. Ignore duplicate tags in GetTopologyLabels

After changing GetHostsForSegment, it is still possible for provisioning to fail during GetTopologyLabels because the CSINodeTopology reports Error: duplicate values detected for category openshift-zone as \\\"zone-C\\\" and \\\"zone-C\\\" in its status. Since GetTopologyLabels iterates up the hierarchy through ancestors starting from the NodeVM, it could update the value instead of throwing an error, which means the higher-level entity's value takes precedent.

Which issue this PR fixes:

fixes #2681

Testing done:

Manually tested volume provisioning with and without this change in the two scenarios listed above.

Special notes for your reviewer:

/cc @divyenpatel @shalini-b @gnufied

Release note:

Fix volume provisioning when duplicate region or zone tags exist

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 7, 2024
@divyenpatel
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Mar 25, 2024
@dobsonj
Copy link
Member Author

dobsonj commented Apr 4, 2024

Hi @shalini-b @divyenpatel, do you feel comfortable accepting this change? Or do you have any additional feedback?

@shalini-b
Copy link
Collaborator

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dobsonj, shalini-b

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 5, 2024
@jsafrane
Copy link
Contributor

jsafrane commented Apr 8, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 8, 2024
@k8s-ci-robot k8s-ci-robot merged commit 6e820bb into kubernetes-sigs:master Apr 8, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vsphere csi controller pods crashloop without multi-vcenter feature gate
5 participants