-
Notifications
You must be signed in to change notification settings - Fork 455
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r/virtual_machine: Failsafe against disk name mismatch #304
Conversation
There are certain situations that can lead to a disk ultimately losing its name at a specific location without Terraform being able to keep track of it: * Significant alteration of a VM config outside of Terraform: It's feasible that someone could go and completely replace a virtual disk in one location with a different one. * Snapshot heck: Snapshots alter the name of a virtual disk. We walk up the child disk path to ultimately try and confirm if we are dealing with the right virtual disk, but it's not out of the realm of possibility that with enough manipulation, the original name of the disk might be lost. * Linked clones and storage vMotion: This is currently a real possibility and the main reason we are putting in this validation. vMotions of linked clones ultimately assign delta disk-like names to virtual disks, completely removing the original name from the chain. In these situations, the existing state of the disk does not facilitate adding keep_on_remove (unless explicitly there already) as the logic does not believe the disk to be orphaned (it's in state and was found in config by key or device address). This can lead to potentially destructive scenarios. This update guards against that - if the base name of the disk that was in state before (if any) and the disk ultimately read out of the backing do not match, taking snapshots into account, an error is kicked back during the read process and TF stops in its tracks. Any state that is pending from DiskRefreshOperation is not committed, and an error message is supplied prompting the user to manually correct the issues with the VM before (carefully) proceeding again. Further commits will close the hole in the final scenario, preventing storage vMotion on linked clones (ie: the root disk in the snapshot chain does not match the file name of the disk we are managing). This should prevent this error from coming up during normal TF operations.
Does not break existing tests:
|
@vancluever perhaps I'm mis-understanding this - but from what I can tell this will breaks the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code itself looks logically sound (although, it needs an acceptance test to cover this use-case) - however I'm concerned this is going to break the terraform plan
workflow (see the other comment)?
@tombuildsstuff I would normally agree with you, but this is a unique case. In this case, what is happening is that the name of a disk is drifting sufficiently from the original that Terraform cannot track it anymore (see #295). This is mainly happening in storage vMotion scenarios, but can indeed happen during regular reads if someone replaces the disk at a certain address with another. We can track the latter in plan, but not the former, which is the more likely scenario at this time. This is not a permanent solution - this is a stop-gap until we can 1) block storage vMotion under certain scenarios and 2) start tracking disks by UUID instead of name, device address, and key (something I want to map out today). PS: Test will be coming this morning. |
After internal discussion we've decided that turning this into a Closing this in favour of a pending PR which will address this. |
Lots of helper code added in this test to help manage datastore paths and what not - things we are kind of doing in other places manually but I'm getting tired of writing out. Also, a helper for moving a virtual disk. The main test case fails right now which is intentional, it fails correctly which is the good thing. Again, this is testing for the case describe in #304 where if a disk with a different name gets put into a place where TF already is tracking, that disk will get deleted on the next TF run. We are working to fix it so that it simply removes it. Ref: #304.
Lots of helper code added in this test to help manage datastore paths and what not - things we are kind of doing in other places manually but I'm getting tired of writing out. Also, a helper for moving a virtual disk. The main test case fails right now which is intentional, it fails correctly which is the good thing. Again, this is testing for the case describe in #304 where if a disk with a different name gets put into a place where TF already is tracking, that disk will get deleted on the next TF run. We are working to fix it so that it simply removes it. Ref: #304.
There are certain situations that can lead to a disk ultimately losing
its name at a specific location without Terraform being able to keep
track of it:
feasible that someone could go and completely replace a virtual disk in
one location with a different one.
the child disk path to ultimately try and confirm if we are dealing with
the right virtual disk, but it's not out of the realm of possibility
that with enough manipulation, the original name of the disk might be
lost.
possibility and the main reason we are putting in this validation.
vMotions of linked clones ultimately assign delta disk-like names to
virtual disks, completely removing the original name from the chain.
In these situations, the existing state of the disk does not facilitate
adding
keep_on_remove
(unless explicitly there already) as the logicdoes not believe the disk to be orphaned (it's in state and was found in
config by key or device address). This can lead to potentially
destructive scenarios.
This update guards against that - if the base name of the disk that was
in state before (if any) and the disk ultimately read out of the
backing do not match, taking snapshots into account, an error is kicked
back during the read process and TF stops in its tracks. Any state that
is pending from
DiskRefreshOperation
is not committed, and an errormessage is supplied prompting the user to manually correct the issues
with the VM before (carefully) proceeding again.
Further commits will close the hole in the final scenario, preventing
storage vMotion on linked clones (ie: the root disk in the snapshot
chain does not match the file name of the disk we are managing). This
should prevent this error from coming up during normal TF operations.
Ref: #295 - this does not completely correct it (that will be coming later) but helps prevent the consequences of it.