Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vSphere VM is always partially tainted (missing network interface in local state) #6174

Closed
benlangfeld opened this issue Apr 14, 2016 · 30 comments

Comments

@benlangfeld
Copy link

Hi there,

Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

Terraform Version

Terraform v0.6.14

Affected Resource(s)

  • vsphere_virtual_machine

Terraform Configuration Files

provider "vsphere" {
  vsphere_server = "172.18.0.137"
  allow_unverified_ssl = true
  user = "blangfeld"
  password = "*****"
}

# Create a folder
resource "vsphere_folder" "NitroDemoPath" {
  datacenter = "PACV-DC"
  path       = "Nitro/Nitro - Demo10"
}

resource "vsphere_virtual_machine" "nitro-webserver" {
  name         = "demo10-nitro-webserver"
  folder       = "${vsphere_folder.NitroDemoPath.path}"
  datacenter   = "PACV-DC"
  vcpu         = 1
  memory       = 4096
  domain       = "mydatainmotion.com"
  time_zone    = "US/Eastern"
  gateway      = "10.0.5.254"
  dns_servers  = ["10.1.1.36", "10.1.1.37", "172.18.0.40"]
  dns_suffixes = ["powerhrg.com", "mydatainmotion.com"]
  boot_delay   = 300

  network_interface {
    label              = "Demo-VLAN4"
    ipv4_address       = "${cidrhost("10.0.5.32/27", 1)}"
    ipv4_prefix_length = "23"
  }

  disk {
    template  = "Templates/Ubuntu-14.04(10GB)"
    datastore = "gotham_vm"
    type      = "thin"
  }
}

Debug Output

https://gist.github.com/benlangfeld/ee8af09e52820d8b8aa4580102c4b229

Panic Output

N/A

Expected Behavior

Initial apply run should save complete resource state.

Actual Behavior

Initial apply run saved state with zero network interfaces. Later plan run refreshed state to include interfaces. A delay between the two is required; even after apply terminates, an immediate plan run will not correctly refresh state, will see a diff in network interfaces, and will attempt to rebuild the nodes.

Steps to Reproduce

  1. terraform apply
  2. sleep 120
  3. terraform plan

Important Factoids

Nothing I'm currently aware of.

References

None

@chrislovecnm
Copy link
Contributor

@benlangfeld this may be related to #4283 ... Which is needs more testing. Let me know if you want to test a branch that I have.

What is going on with the logs on the vm and the logs / messages in vSphere??

@chrislovecnm
Copy link
Contributor

This may be fixed by #6293 - any chance you can take master for a test drive??

@benlangfeld
Copy link
Author

@chrislovecnm Will test this now.

@benlangfeld
Copy link
Author

Unfortunately the problem persists on master: https://gist.github.com/benlangfeld/83507859a54f39487ba9b4b945d4be23

@chrislovecnm
Copy link
Contributor

@benlangfeld we may need to move that WaitForIp up higher in the code block. Knee deep in some monitoring code at this minute. Do you code btw??

@benlangfeld
Copy link
Author

Do you code btw??

Absolutely. My familiarity with Go is superficial, but the main problem is not knowing what the expected behaviour of the vSphere API is in this case.

@chrislovecnm
Copy link
Contributor

@benlangfeld we need the provider to wait for the nic(s) to get ip addresses. We have been informed that a code block like this would force the wait.

I think the code block may be to be moved up in the code base, but I have not had a chance to test. I would move the code block to here.

@markpeek and @frapposelli this is another high priority issue. Any ideas?

@benlangfeld
Copy link
Author

@chrislovecnm Unfortunately that doesn't actually make any difference:

2016/04/22 13:36:42 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:42 [DEBUG] ip address: 10.0.2.57
2016/04/22 13:36:42 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:42 [INFO] Created virtual machine: Nitro/Nitro - Demo10/demo10-nitro-webserver
2016/04/22 13:36:42 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:42 [DEBUG] reading virtual machine: &schema.ResourceData{schema:map[string]*schema.Schema{"domain":(*schema.Schema)(0xc8203fe5b0), "windows_opt_config":(*schema.Schema)(0xc8203fea90), "disk":(*schema.Schema)(0xc8203fec30), "resource_pool":(*schema.Schema)(0xc8203fe340), "linked_clone":(*schema.Schema)(0xc8203fe410), "dns_servers":(*schema.Schema)(0xc8203fe8f0), "folder":(*schema.Schema)(0xc820395ee0), "memory":(*schema.Schema)(0xc8203fe0d0), "datacenter":(*schema.Schema)(0xc8203fe1a0), "cluster":(*schema.Schema)(0xc8203fe270), "gateway":(*schema.Schema)(0xc8203fe4e0), "time_zone":(*schema.Schema)(0xc8203fe680), "dns_suffixes":(*schema.Schema)(0xc8203fe750), "custom_configuration_parameters":(*schema.Schema)(0xc8203fe9c0), "vcpu":(*schema.Schema)(0xc8203fe000), "boot_delay":(*schema.Schema)(0xc8203fedd0), "network_interface":(*schema.Schema)(0xc8203feb60), "cdrom":(*schema.Schema)(0xc8203fed00), "name":(*schema.Schema)(0xc820395d40)}, config:(*terraform.ResourceConfig)(nil), state:(*terraform.InstanceState)(nil), diff:(*terraform.InstanceDiff)(0xc8203a7a00), multiReader:(*schema.MultiLevelFieldReader)(0xc8204a7980), setWriter:(*schema.MapFieldWriter)(0xc8204a78e0), newState:(*terraform.InstanceState)(0xc82032d200), partial:false, partialMap:map[string]struct {}(nil), once:sync.Once{m:sync.Mutex{state:0, sema:0x0}, done:0x1}, isNew:true}
vsphere_virtual_machine.nitro-webserver: Still creating... (2m30s elapsed)
2016/04/22 13:36:44 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:44 [DEBUG] &object.Datacenter{Common:object.Common{c:(*vim25.Client)(0xc8201e2f00), r:types.ManagedObjectReference{Type:"Datacenter", Value:"datacenter-2"}}}
2016/04/22 13:36:44 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:44 [DEBUG] types.VirtualMachineConfigSummary{DynamicData:types.DynamicData{}, Name:"demo10-nitro-webserver", Template:false, VmPathName:"[gotham_vm] demo10-nitro-webserver/demo10-nitro-webserver.vmx", MemorySizeMB:4096, CpuReservation:0, MemoryReservation:0, NumCpu:1, NumEthernetCards:1, NumVirtualDisks:1, Uuid:"42026077-2e2a-2c2c-a49e-406d9f1f0eb2", InstanceUuid:"5002a649-5ff4-89be-b08e-c5bbd543c1ec", GuestId:"ubuntu64Guest", GuestFullName:"Ubuntu Linux (64-bit)", Annotation:"OS-Handled HOT-ADD enabled for Memory and CPU. Fresh SSH keys will generate upon initial deploy. Fully updated as of 4/27/15. Contains internal repos (Both Trusty and Precise!)", Product:(*types.VAppProductInfo)(0xc820086780), InstallBootRequired:(*bool)(0xc820da322d), FtInfo:types.BaseFaultToleranceConfigInfo(nil), ManagedBy:(*types.ManagedByInfo)(nil)}
2016/04/22 13:36:44 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:44 [DEBUG] []types.GuestNicInfo(nil)
2016/04/22 13:36:44 [DEBUG] terraform-provider-vsphere: 2016/04/22 13:36:44 [DEBUG] networkInterfaces: []map[string]interface {}{}

The problem is that WaitForIP only checks the guest.ipAddress property while the list of interfaces is fetched from mo.VirtualMachine.Guest.Net, which apparently has some other implementation in the vSphere API, but I'm struggling to understand where exactly it comes from.

@chrislovecnm
Copy link
Contributor

Let's ping a couple of the guys who code the API 😄
@dougm @ignatov @pietern

govmomi folks. Let me know if we need to log an issue. We have chatted with @pietern in the past about this, and were recommended to use WaitForIP.

We need to have the provider wait till VirtualMachine.Guest.Net objects actually have IP Addresses. The code is returning before the IP Address are available through that call.

Any ideas on debugging this? Any recommendations?

@chrislovecnm
Copy link
Contributor

@benlangfeld yah I don't completely understand how the query:

collector.RetrieveOne(context.TODO(), vm.Reference(), []string{"guest", "summary", "datastore"}, &mvm);

works either.

That is where mvm.Guest.Net is coming back from. How we check that the fields are populated is the question.

@benlangfeld
Copy link
Author

@chrislovecnm I believe what we need to do is poll the VirtualMachine.Guest.Net property repeatedly until it has the same size as the number of interfaces we declare in the resource, or some timeout, roughly similar to what WaitForIP does. I'm throwing together a quick diff to approximate this, but I don't claim that it'll be ready to use; I'm not sure I fully understand this module's tests suite...

@chrislovecnm
Copy link
Contributor

@benlangfeld we would need to poll RetrieveOne ... but there "should" be a wait ...

@chrislovecnm
Copy link
Contributor

@benlangfeld digging around I think we need to do a WaitForUpdates but it would be good to have API peep to let us know.

@chrislovecnm
Copy link
Contributor

I thinking that I would try a Wait on mvm.Guest.Net and elements there in ...

BTW I do not have a test bed right now. Otherwise I would be testing as well ... DOH

Which modules test suites are you referring to?? TF?

@benlangfeld
Copy link
Author

@chrislovecnm
Copy link
Contributor

@benlangfeld you want to take the testing discussion offline? clove at cnmconsulting dot net ... It is a google account, so hangouts work, or email. We need a slack account for this darn provider 😄 It took me a bit to get my head around them as well. BTW:

Once all these variables are in place, the tests can be run like this:

make testacc TEST=./builtin/providers/vsphere

BTW - stepping away for a bit ... FOOD

@dougm
Copy link

dougm commented Apr 25, 2016

Sounds like you're looking for something like this: vmware/govmomi#501 ?

@chrislovecnm
Copy link
Contributor

chrislovecnm commented Apr 25, 2016

LOL - just writing a ticket - thanks @dougm ... @benlangfeld can you test this?

@benlangfeld
Copy link
Author

@chrislovecnm Will do as soon as I can, which will probably be late this week or next week.

@knuckolls
Copy link
Contributor

I fixed this by adding in a loop that waits until the network information is returning properly from the API. It's up on a rough PR that just shows what I did to get unblocked today. You can see the code in #6547.

@chrislovecnm
Copy link
Contributor

chrislovecnm commented May 9, 2016

@thetuxkeeper where the hell is that wait for all ips call ... I am at the airport and don't have time to dig for it...

@knuckolls documented in one of these darn issues that we have open is an API call to wait for all of the ips.

@thetuxkeeper
Copy link
Contributor

@chrislovecnm : That's the function (WaitForNetIP): https://github.com/vmware/govmomi/blob/master/object/virtual_machine.go#L249

@benlangfeld
Copy link
Author

@chrislovecnm I believe this should be fixed now that #6377 has been merged? I will test ASAP.

@chrislovecnm
Copy link
Contributor

@benlangfeld you the man!!! Let us know how it goes ;)

@benlangfeld
Copy link
Author

@chrislovecnm This works great for me. Thank you to everyone involved, particularly @thetuxkeeper and @dougm. I'll be holding my breath until this makes it into a release.

@chrislovecnm
Copy link
Contributor

@benlangfeld woot!!

@amarruedo
Copy link

Hi there!

I'm having this exact same issue with Terraform v0.10.2 and v0.10.7

Terraform Configuration Files

# Configure the VMware vSphere Provider. ENV Variables set for Username and Passwd.
provider "vsphere" {
 vsphere_server = "192.168.105.10"
 allow_unverified_ssl = true
}

# Create a folder
resource "vsphere_folder" "TestPath" {
  datacenter = "Datacenter"
  path       = "Test"
}

# Define the VM resource
resource "vsphere_virtual_machine" "example" {
 name   = "node-1"
 folder = "${vsphere_folder.TestPath.path}"
 vcpu   = 2
 memory = 4096
 datacenter = "Datacenter"
 cluster = "Cluster_rnd"
 enable_disk_uuid  = "true"
 skip_customization = "true"

# Define the Networking settings for the VM
 network_interface {
   label = "VM Network"
   ipv4_gateway = "192.168.105.1"
   ipv4_address = "192.168.105.24"
   ipv4_prefix_length = "24"
 }

# Define DNS
 dns_servers = ["8.8.8.8"]

# Define the Disks and resources. The first disk should include the template.
 disk {
   template = "CoreOS"
   datastore = "vol_af01_idvms"
   type ="thin"
 }

}

The apply times out with vsphere_virtual_machine.example: timeout waiting for a routeable interface error.

Here are the logs.

If I do plan again, then it tries to add network interfaces with no luck.

Terraform will perform the following actions:

  ~ vsphere_virtual_machine.example
      network_interface.0.ipv4_address:       "" => "192.168.105.24"
      network_interface.0.ipv4_gateway:       "" => "192.168.105.1"
      network_interface.0.ipv4_prefix_length: "0" => "24"


Plan: 0 to add, 1 to change, 0 to destroy.

If I add wait_for_guest_net = "false"to the configuration file, I get rid of the previous timeout error, but I still don't get a virtual machine with IPv4 network interface as stated in the configuration (the virtual machine defaults to a IPv6 network interface).

I'm trying to create a CoreOS machine.

Any idea on whats going on?

@byahia
Copy link

byahia commented Nov 1, 2017

I am having the same problem as @amarruedo except that i'm trying to create a windows machine.
Would anyone help?

@amarruedo
Copy link

I found out that for CoreOS Linux, I have to use Ignition Provider, and then pass to the vsphere virtual machine that ignition definition as a custom_configuration_parameters:

custom_configuration_parameters {
    guestinfo.coreos.config.data.encoding = "base64"
    guestinfo.coreos.config.data          = "${base64encode(data.ignition_config.rendered)}"
}

Viewing the code of tectonic-installer was helpful to understand this.

@ghost
Copy link

ghost commented Apr 6, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants