-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
triton_machine: Introduce new Locality stanza #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's left from your perspective, @cheapRoc but this looks reasonably close.
triton/resource_machine.go
Outdated
Description: "UUIDs of other instances to provision alongside", | ||
Optional: true, | ||
Type: schema.TypeList, | ||
Elem: &schema.Schema{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per Terraform provider style, it's okay to collapse this onto a single line.
triton/resource_machine.go
Outdated
"far_from": { | ||
Description: "UUIDs of other instances to not provision alongside", | ||
Optional: true, | ||
Type: schema.TypeList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on the fence as to whether or not this should be ForceNew: true
or not. If I change this value, I'm expecting to be far_from
whatever I included. Same applies to close_to
, too. If it moves, you want to detect this drift. The hint isn't a guarantee, but... with strict = true
, I could see this being important. I also don't want to get hung up on this, either.
At the very least, food for thought and something to discuss with interested parties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Actually, since you've brought it up, I think ForceNew
is reasonable here. Locality information is never returned to the consumer (besides parent compute_node
UUID) so Terraform never sees a difference. If the consumer changes this information, causing the difference themselves, they're defining a new contract with the provisioned instance, thus forcing a re-provision given new locality hints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm second guessing myself given your notes in chat. It does sound like the dependency graph could potentially cascade into an entire re-cluster given cross dependent locality configuration.
My comments about compute_node
UUID were to mention that this is the only data we receive about the instituted state of locality after an instance is created. The compute_node
value returned by CloudAPI is the only "tell" of where the instance ends up hosted. There might be some sort of placement resolution behavior or comparison we could draw using this value (suppress locality difference if CN != blah blah blah), but that sounds awful hack-ey and probably won't work.
Given that this feature isn't replacing cluster scheduling as a whole, and should only exist to assist initial HA "hints", I think we're cool to back off the ForceNew
idea for the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For those following along at home, here's my nightmare scenario:
resource "triton_machine" "consul1" {
name = "consul01"
# snip
}
resource "triton_machine" "consul2" {
name = "consul01"
# snip
locality {
strict = true
far_from = ["${triton_machine.consul1.id}"]
}
}
resource "triton_machine" "consul3" {
name = "consul01"
# snip
locality {
strict = true
far_from = ["${triton_machine.consul1.id}", "${triton_machine.consul2.id}"]
}
}
Now suppose consul1
dies. And then someone does a terraform plan && terraform apply
. waves goodby to Consul cluster
On one hand, it's possible consul1
could be replaced on either consul2
or consul3
's CNs (or in the same rack). On another hand, maybe this isn't so bad because at least it will tell you something is wrong AND you need to do a blue/green flip in order to deploy three new nodes that are all spaced apart correctly.
The correct behavior is hard to achieve without performing surgery on the DAG somehow. Ideally there would be some late-binding ID that could be used in the future, or some kind of infrastructure promise that could be acquired before the machines are created. Something like:
resource "triton_machine_contract" "consul_servers" {
rule = "far_from"
}
resource "triton_machine" "consul1" {
name = "consul01"
# snip
locality {
strict = true
placement_contract = "${triton_machine_contract.consul_servers.id}"
}
}
resource "triton_machine" "consul2" {
name = "consul01"
# snip
locality {
strict = true
placement_contract = "${triton_machine_contract.consul_servers.id}"
}
}
resource "triton_machine" "consul3" {
name = "consul01"
# snip
locality {
strict = true
placement_contract = "${triton_machine_contract.consul_servers.id}"
}
}
because then the contract ID wouldn't change and nodes could die off per entropy and be replaced without risking consul[1-3]
being possibly wiped from the face of the earth. Until that time, I'm okay with changing ForceNew: true
if strict = true
. If strict = false
, then ForceNew: false
. I also don't want to block on this trying to get it perfect.
CC @jen20 for consideration
triton/resource_machine.go
Outdated
if strictRaw, found := d.GetOk("locality.0.strict"); found { | ||
createInput.LocalityStrict = strictRaw.(bool) | ||
} | ||
if nearRaw, found := d.GetOk("locality.0.close_to"); found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a newline between each of the if
statements here and below.
The behavior is definitely "there". The only thing missing was acceptance tests and a couple more usability tests. |
@sean- Everything is squashed down, ready for a merge. |
To distill the discussion above, this does not contain any of the |
I've squashed/updated the branch and included changes to docs and code that remove |
…ffinity rules * Add locality stanza to triton_machine schema * Documentation for new locality stanza * Acceptance tests for new locality stanza This feature does not include the `strict` argument for locality hints within Triton. This behavior returns errors if locality hints cannot be applied, leading to issues with some architectures using Terraform. A future update to the provider might find a more beneficial manner in which to utilize this but it's not included today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Ref: #19
WIP: This is a work-in-progress PR but I wanted to submit it in case anyone wanted to test. I cranked it out before the weekend and didn't do much testing/refactoring once I got the initial behavior in.This introduces the locality stanza. You'll only be able to use this on create as the data is never returned to Terraform after provisioning. However, so far, it's been working for me with the following configuration...
/cc @sean-