-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provisioning V2 / RKEv2 does not work with third party node drivers #37074
Comments
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions. |
still a problem |
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions. |
still a problem |
This repository uses an automated workflow to automatically label issues which have not had any activity (commit/comment/label) for 60 days. This helps us manage the community issues better. If the issue is still relevant, please add a comment to the issue so the workflow can remove the label and we know it is still valid. If it is no longer relevant (or possibly fixed in the latest release), the workflow will automatically close the issue in 14 days. Thank you for your contributions. |
we're having the same issue here with our custom node driver. @maxaudron did you end up fixing this issue somehow? |
I work with @maxaudron Unfortunately we didn't fix the issue yet. |
still a problem |
Can we re-open this ticket? Its not possible to provision RKE2 clusters with the nutanix node driver |
Need to fix the v2prov assumption that NodeDriver resources will have a specific name; or fix how 3rd party node drivers are named when installed. |
Hopefully this is still being worked on. It is currently a Blocker for me rolling out Rancher across our estate. |
This is bug is caused by the fact that node drivers were initially designed to be decoupled from the name with RKE1, however with v2prov (and specifically CAPI), CRDs are required (which was correctly pointed out here: #37074 (comment)). The linked code https://github.com/rancher/rancher/blob/release/v2.6/pkg/controllers/provisioningv2/rke2/machineprovision/args.go#L298 is responsible, however the true underlying culprit comes from here: https://github.com/rancher/rancher/blob/e5cc549591fbdf6aec91915b83384cd78b56f769/pkg/controllers/management/drivers/nodedriver/machine_driver.go#L224C54-L224C54. This piece of code uses the displayName of the node driver object, which is not settable at creation time from the UI. Additionally, there is no validation in place to prevent multiple node drivers from using the same displayName, which will cause the dynamic schema to thrash and potentially cause data loss, or from changing the displayName, which would also result in data loss. Although one can set this displayName manually, this is not a suitable long term solution. A potential long term solution would be for the backend to use the k8s metadata name (which corresponds to the norman id), however the UI is using norman, and I was not able to create a node driver whilst specifying the id in a POST request using curl. This requires input from @rancher/rancher-team-1-neo-dev as to whether or not it is possible from within norman to specify the id in the request. There is no way to remove the rancher requirement that the names have to be unique due to the generated CRDs, and validating that all node drivers have a different display name would not be a suitable alternative as opposed to just using the name of the corresponding nodedriver CR. cc @gaktive WorkaroundThe below script outlines a workaround, assuming one has already encounted the issue when attempting to create a third party driver in the rancher UI with the correct url. The node driver should be inactive before running this script, as deactivation causes CRs to be cleaned up. (export NAME="<DRIVER NAME (must be [a-z]*)>" NODEDRIVER="<DRIVER ID (e.g. nd-12345)>"; kubectl get nodedriver "${NODEDRIVER}" -o yaml | yq 'del(.status) | .metadata |= with_entries(select(.key == "annotations")) | .metadata.annotations |= with_entries(select(.key == "publicCredentialFields" or .key == "privateCredentialFields"))' | yq ".metadata.name = strenv(NAME)" | yq ".spec.displayName = .metadata.name") After this, the original node driver (with prefix These are the minimum required fields to create a node driver. Once this yaml is retrieved, it can be piped to kubectl apply and the correspondingly generated node driver should be created.
|
@jakefhyde can you file a ticket in rancher/dashboard and link back here? |
@rancher/docs , FYI moved this to "Release Note" status as we would want to include the workaround #37074 (comment) in the next release notes, not specifically 2023-Q4/2024-Q1 releases. |
@gaktive Holding off on creating dashboard ticket for now, may require some additional work. |
@snasovich do you mean the emergency release we're currently working on, or the one after? |
@martyav , any next release. Won't hurt to put in the out-of-band release you mentioned - but we will want to retain it in Q3/v2.7-Next release notes as well as this won't be fixed in it. |
Removing milestone from this issue as it's unlikely we will get to it soon especially given workaround exists. |
Rancher Server Setup
Information about the Cluster
User Information
Describe the bug
When trying to provision a cluster with a third party node
driver, that isn't a builtin, provisioning of a rke2 cluster fails.
Third Party node drivers added to rancher get a randomly assigned name as their
kubernetes resource name:
But rke2 assumes the name when trying to provision machines leading to an error:
This error also isn't logged properly in the UI. There machines are only
described as
Waiting to schedule machine create
.I assume this piece of code is responsible: https://github.com/rancher/rancher/blob/release/v2.6/pkg/controllers/provisioningv2/rke2/machineprovision/args.go#L298
Which simply takes the Kind of the machine crd, in my case
NutanixMachine
.To Reproduce
Result
The cluster is stuck in provisioning and only showing
Waiting to schedule machine create
as status for machinesExpected Result
The cluster provisions sucessfully
Workaround
Create the nodedriver manually in the backing kubernetes cluster with the correct name
The text was updated successfully, but these errors were encountered: