Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add configurable timeout for VM PowerOn operations #990

Merged
merged 2 commits into from
Mar 18, 2020
Merged

Conversation

koikonom
Copy link
Contributor

There is a conflict right now between the way the vsphere provider
builds VMs and how vSphere expects VMs that are part of a vApp to
behave, especially in the way things are powered on.

vApps expect that the VMs will be built but not powered on, because it
handles powering on the VM on its own in a specific order. On the other
hand the provider will build the VM and instantly power it on regardless
of where it belongs.

The result of this conflict is that vsphere will not allow any VM to
power on until all VMs in the vApp are ready to be powered on. So for
example while cloning two VMs in the same vApp, we are only allowed to
power on the first VM once both of them are done cloning. Any attempts
to power on a VM earlier result in a "InvalidState" response.

This commit introduces a configurable timeout, poweron_timeout, that
represents the amount of time we are going to give a VM to power on. The
default (and minimum) is 300 seconds (5 minutes).

The full power on process goes like this:

  • Timer starts (for as long as poweron_timeout is set to).
  • First we poll the Disabled Methods list for the VM object until
    PowerOnVM_Task is no longer in it.
  • Next we enter a loop for the remainder of the time where we issue
    PowerOnVM requests, and check the response. If it's successful we
    return, otherwise we check the error message. If the error message is
    about the VM being in an invalid state then we sleep for 500ms and
    retry, otherwise we return an error.
  • If we hit the timeout and we have not been able to successfully power
    on the VM so far, we return an error.

There is a conflict right now between the way the vsphere provider
builds VMs and how vSphere expects VMs that are part of a vApp to
behave, especially in the way things are powered on.

vApps expect that the VMs will be built but not powered on, because it
handles powering on the VM on its own in a specific order. On the other
hand the provider will build the VM and instantly power it on regardless
of where it belongs.

The result of this conflict is that vsphere will not allow any VM to
power on until all VMs in the vApp are ready to be powered on. So for
example while cloning two VMs in the same vApp, we are only allowed to
power on the first VM once both of them are done cloning. Any attempts
to power on a VM earlier result in a "InvalidState" response.

This commit introduces a configurable timeout, `poweron_timeout`, that
represents the amount of time we are going to give a VM to power on. The
default (and minimum) is 300 seconds (5 minutes).

The full power on process goes like this:

- Timer starts (for as long as poweron_timeout is set to).
- First we poll the `Disabled Methods` list for the VM object until
PowerOnVM_Task is no longer in it.
- Next we enter a loop for the remainder of the time where we issue
PowerOnVM requests, and check the response. If it's successful we
return, otherwise we check the error message. If the error message is
about the VM being in an invalid state then we sleep for 500ms and
retry, otherwise we return an error.
- If we hit the timeout and we have not been able to successfully power
on the VM so far, we return an error.
@koikonom koikonom requested a review from a team March 10, 2020 21:14
@ghost ghost added the size/m Relative Sizing: Medium label Mar 10, 2020
Copy link
Contributor

@bill-rich bill-rich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple minor comments, but looks good!

Comment on lines 521 to 522
//tctx, tcancel := context.WithTimeout(context.Background(), provider.DefaultAPITimeout)
//defer tcancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the original lines got left in, but commented out. These can be removed if they aren't needed in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, will do.

Comment on lines 1350 to 1354
pTimeoutStr := fmt.Sprintf("%ds", d.Get("poweron_timeout").(int))
pTimeout, err := time.ParseDuration(pTimeoutStr)
if err != nil {
return nil, fmt.Errorf("failed to parse poweron_timeout as a valid duration: %s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding an IntToString function to the structure helper since you've its been used a couple times.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to change the way the timeout is calculated, so this is no longer used.

@rchadalawada
Copy link

@koikonom Why are you resorting to time outs? Why not check if the VM is part of a vApp and skip power on? Wouldn't it make sense to treat vApp as a single entity.

@koikonom
Copy link
Contributor Author

Hey @rchadalawada, great question!

In terraform we have two vApp related resources, one is the vapp_container and the other is the vapp_entity. Unfortunately none of them is able to control powering on the VM because neither of them is aware of the full picture. The container just creates the vApp and vapp_entity can be used to associate VMs to the vApp, manage their startup order etc. When the provider runs each resource is only given its own configuration, as a result neither of those entities is aware of the full picture.

You are correct when you say that I could skip powering on the VM if it's a member of a vApp, but then there is no resource to control the power status of the vApp once everything is built.

In order to do this right we would have to revisit many basic elements of how the VMs are built and how we manage vApps as a whole and this is a much larger project that needs to be prioritised accordingly.

@aareet aareet added the bug Type: Bug label Mar 17, 2020
@aareet aareet linked an issue Mar 17, 2020 that may be closed by this pull request
@koikonom koikonom requested a review from bill-rich March 18, 2020 16:02
Copy link
Contributor

@bill-rich bill-rich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@koikonom koikonom merged commit 443552c into master Mar 18, 2020
@koikonom koikonom deleted the poweron_timeout branch March 18, 2020 18:53
@ghost
Copy link

ghost commented Apr 18, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. If you feel I made an error 🤖 🙉 , please reach out to my human friends 👉 hashibot-feedback@hashicorp.com. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 18, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Type: Bug size/m Relative Sizing: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

wait before powering on the vm?
4 participants