-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix volume provisioning when duplicate region or zone tags exist #2814
Conversation
/ok-to-test |
Hi @shalini-b @divyenpatel, do you feel comfortable accepting this change? Or do you have any additional feedback? |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dobsonj, shalini-b The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What this PR does / why we need it:
This PR allows volume provisioning to work in two scenarios:
A) Both the datacenter and the cluster have a (duplicate) region tag
B) Both the cluster and the host have a (duplicate) zone tag
In both cases, the volume fails to be provisioned with:
Warning ProvisioningFailed 7m42s (x9 over 12m) csi.vsphere.vmware.com_vmware-vsphere-csi-driver-controller-f9bf8757c-7mhdl_4c338b41-cb91-400f-8034-eb1cf1452e34 failed to provision volume with StorageClass "thin-csi": rpc error: code = Internal desc = failed to create volume. Errors encountered: [No compatible datastores found for accessibility requirements [map[topology.csi.vmware.com/openshift-region:region-A topology.csi.vmware.com/openshift-zone:zone-C]] pertaining to vCenter ...
Although this cluster is tagged incorrectly, this scenario did work on previous versions of the driver, which means upgrading the driver could break volume provisioning on existing clusters where this previously worked.
I propose a simple solution: when analyzing the tags, prefer the entity highest in the hierarchy and ignore the duplicate tag lower in the hierarchy. If the datacenter has a region tag, that takes precedent over any cluster underneath it. Likewise, if the cluster has a zone tag, that takes precedent over any host in that cluster.
This can be done with two changes:
findCommonHostsforAllTopologyKeys only returns the host reference if the host is found exactly once in each segment. If the tag exists twice in the hierarchy, it fails and returns an empty slice (no common hosts) even though there is a common host in both region-A and zone-C. See this backtrace as an example. If we remove duplicate host entries before finding common hosts, it correctly returns the only common host for the region and zone tags.
After changing GetHostsForSegment, it is still possible for provisioning to fail during GetTopologyLabels because the CSINodeTopology reports
Error: duplicate values detected for category openshift-zone as \\\"zone-C\\\" and \\\"zone-C\\\"
in its status. Since GetTopologyLabels iterates up the hierarchy through ancestors starting from the NodeVM, it could update the value instead of throwing an error, which means the higher-level entity's value takes precedent.Which issue this PR fixes:
fixes #2681
Testing done:
Manually tested volume provisioning with and without this change in the two scenarios listed above.
Special notes for your reviewer:
/cc @divyenpatel @shalini-b @gnufied
Release note: