Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-controller-manager does not start if "vmFolder" is not provided in config #401

Closed
ykakarap opened this issue Jul 2, 2019 · 18 comments · Fixed by #406
Closed

kube-controller-manager does not start if "vmFolder" is not provided in config #401

ykakarap opened this issue Jul 2, 2019 · 18 comments · Fixed by #406
Assignees

Comments

@ykakarap
Copy link

ykakarap commented Jul 2, 2019

"vmFolder" is an optional parameter to be set while generating the manifests. "vmFolder" in VsphereMachineProviderConfig defaults to an empty string while generating the manifests.

Using these manifests with clusterctl results in a management cluster in which kube-controller-manager does not work, resulting in a broken management cluster.

Possible reason:
Spinning up of kube-controller-manager needs to know the correct "vmFolder" which by default is an empty string. The data that is passed to the SetCloudInitUserData method is not modified to fill the values like "vmFolder", "resoursePool" if missing because SetCloudInitUserData is called before we get the correct "vmFolder", "resourcePool", etc using the FolderOrDefault, ResourcePoolOrDefault, etc functions.

Possible fixes:

  • Make the "vmFolder" field required.
    • Advantages: Easy implementation
    • Disadvantages: Might be enforcing a field that should not be enforced.
  • Make code changes to retrieve correct default values before calling SetCloudInitUserData.
    • Advantages: The user-facing VsphereMachineProviderConfig can remain as is.
    • Disadvantages: The argument passed to SetCloudInitUserData is not a struct but an array of bytes. Modifying it to support default values will need changes elsewhere too (possibly the generateUserData function).

/cc @figo

@akutz
Copy link
Contributor

akutz commented Jul 2, 2019

The solution will be number two:

Make code changes to retrieve correct default values before calling SetCloudInitUserData.

This falls under the issue discussed by @andrewsykim and myself regarding moving several cloud provider config settings to the cluster provider spec.

@akutz
Copy link
Contributor

akutz commented Jul 2, 2019

By the way, awesome job catching this!

@figo
Copy link
Contributor

figo commented Jul 2, 2019

The solution will be number two:

Make code changes to retrieve correct default values before calling SetCloudInitUserData.

This falls under the issue discussed by @andrewsykim and myself regarding moving several cloud provider config settings to the cluster provider spec.

agree that fix should be number 2,
i guess moving several cloud provider config settings to the cluster provider spec is a separate effort, as long as VM folder needed in both cloud provider and vm clone, fix is needed.

By the way, awesome job catching this!

good job @ykakarap, there is 1 or few more (not sure yet) capv issues with internal cloud setup.

@andrewsykim
Copy link
Member

I might be missing some context, but if the in-tree vSphere provider requires VM folder to be set, then we should require that here otherwise storage plugins will not work?

@akutz
Copy link
Contributor

akutz commented Jul 2, 2019

With some objects, like pools, folders, datastores, you can do a “GetThisOrDefault”. We do that for those objects, just not ahead of the construction of the cloud config.

/assign @akutz

I’ll fix this with a quick patch.

@figo
Copy link
Contributor

figo commented Jul 2, 2019

I might be missing some context, but if the in-tree vSphere provider requires VM folder to be set, then we should require that here otherwise storage plugins will not work?

yes, folder is required field for in-tree provider vSphere, it up to debate whether it should be required input from user, figure out default at runtime should work ( as long as user has permission to the default folder)

@andrewsykim
Copy link
Member

I would say that CAPV shouldn't try to find defaults here and should require an explicit VM folder. The default behavior should be something the cloud provider does and it should document clearly what happens if it is not set. My experience with vSphere in that area is lacking though so will defer to folks with more experience here.

@figo
Copy link
Contributor

figo commented Jul 2, 2019

With some objects, like pools, folders, datastores, you can do a “GetThisOrDefault”. We do that for those objects, just not ahead of the construction of the cloud config.

/assign @akutz

I’ll fix this with a quick patch.

@ykakarap might has a fix in working, anyway, should be a straightforward fix.

@akutz
Copy link
Contributor

akutz commented Jul 2, 2019

@ykakarap Sounds good, let me know when it’s ready.

@andrewsykim, except a lot of people don’t know the path of the default folder or pool, hence the reason these helpers exist. Trust me, it’s not something to put on users or the UX.

@andrewsykim
Copy link
Member

@andrewsykim, except a lot of people don’t know the path of the default folder or pool, hence the reason these helpers exist. Trust me, it’s not something to put on users or the UX.

Good to know, thanks

@ykakarap
Copy link
Author

ykakarap commented Jul 2, 2019

By the way, awesome job catching this!

Thanks. I got great help from @figo in figuring this out.

@ykakarap Sounds good, let me know when it’s ready.

@akutz Sure. I will need some time to understand the code flow. I will keep you posted when I have an update.

@ykakarap
Copy link
Author

ykakarap commented Jul 2, 2019

there is 1 or few more (not sure yet) capv issues with internal cloud setup.

I will open other issues for them.

@ykakarap
Copy link
Author

ykakarap commented Jul 2, 2019

/assign @ykakarap

@dougm
Copy link
Member

dougm commented Jul 2, 2019

I agree FolderOrDefault should not be used here. Using *OrDefault for Datacenter, ResourcePool, Datastore, types is ok as it is possible there is only a single instance in the environment. In the case of the Folder type, there will be a minimum of 5 instances: the root folder and each Datacenter has at least 4 folders (vm, host, datastore, network). If CAPV were to provide a default folder, it should be "vm", relative to the configured Datacenter. The "vm" folder will always be there, cannot be renamed or removed.

Also note that the value of vmFolder must be in inventory path (relative to Datacenter or absolute), which supports childType VirtualMachine. For example:

% govc find vm -type f -childType VirtualMachine                                                                                      
vm
vm/Testing
vm/Templates
vm/Discovered virtual machine
vm/Testing/K8s
vm/Testing/K8s/capv-mgmt-cluster
% govc find / -type f -childType VirtualMachine
/dc1/vm
/dc1/vm/Testing
/dc1/vm/Templates
/dc1/vm/Discovered virtual machine
/dc1/vm/Testing/K8s
/dc1/vm/Testing/K8s/capv-mgmt-cluster

@ykakarap
Copy link
Author

ykakarap commented Jul 3, 2019

@akutz Leaving the VmFolder option out of the cloudconfig does not create a problem while creating the VM but does create one while spinning up the kube-controller-manager. The kube-controller-manger needs the Folder option (which it is likely reading from the cloudconfig).We need to use the FolderOrDefault function to set the default folder value in the cloudconfig for this reason.

The FolderOrDefault function was returning an empty string because of a bug in the DefaultFolder function. After creating the Folder object its InventoryPath is not being set. therefore, the .Name() function (which returns the InventoryPath) returned an empty string. @dougm I will raise a PR in govmomi for this.

After fixing the bug, the function returns the correct vmFolder and the kube-controller-manager works as expected.

I tested this on v0.3.0 release branch as the master branch uses the latest of cluster-api which currently has an issue : kubernetes-sigs/cluster-api#1104. @pires is working on it. I will sync up with him on that and retest on master.

@akutz
Copy link
Contributor

akutz commented Jul 3, 2019

Hi @ykakarap,

According to @dougm above, if we use FolderOrDefault and the resulting Name returns an empty string, then the folder name in the cloud config may be set to vm.

@ykakarap
Copy link
Author

ykakarap commented Jul 3, 2019

@akutz With this fix the FolderOrDefault will end up returning "vm" as the default folder. I think this is more consistent as "vm" is generally the default folder and it makes sense to return that as the result of FolderOrDefault.

We wont have to add an extra check to see if the default folder is empty and then set it to "vm".

What are your thoughts?

@akutz
Copy link
Contributor

akutz commented Jul 3, 2019

Hi @ykakarap,

I do not disagree. I'm saying there is a fix we can do in CAPV and there is a fix we can do in GoVmomi. Fix it in CAPV as is and file an issue to track the fix in GoVmomi and revisit this later. Heck, just add the empty string check in CAPV, and at some point, when we update the GoVmomi dependency, it won't ever be empty and the CAPV fix code won't be hit.

Either way, this unblocks us in CAPV today.

jayunit100 pushed a commit to jayunit100/cluster-api-provider-vsphere that referenced this issue Feb 26, 2020
Signed-off-by: Vince Prignano <vince@vincepri.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants