Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton_machine: Introduce new Locality stanza #31

Merged
merged 1 commit into from
Aug 16, 2017

Conversation

jwreagor
Copy link
Contributor

@jwreagor jwreagor commented Aug 11, 2017

Ref: #19

WIP: This is a work-in-progress PR but I wanted to submit it in case anyone wanted to test. I cranked it out before the weekend and didn't do much testing/refactoring once I got the initial behavior in.

This introduces the locality stanza. You'll only be able to use this on create as the data is never returned to Terraform after provisioning. However, so far, it's been working for me with the following configuration...

provider "triton" {}

data "triton_image" "base" {
  name    = "base-64"
  os      = "smartos"
  type    = "zone-dataset"
  state   = "active"
  version = "17.2.0"
}

data "triton_network" "public" {
  name = "Joyent-SDC-Public"
}

resource "triton_machine" "base0" {
  name = "bench-0"
  package = "g4-highcpu-512M"
  image = "${data.triton_image.base.id}"

  nic {
    network = "${data.triton_network.public.id}"
  }

  tags {
    version = "1.0.0"
  }

}

resource "triton_machine" "base1" {
  name = "bench-1"
  package = "g4-highcpu-512M"
  image = "${data.triton_image.base.id}"

  nic {
    network = "${data.triton_network.public.id}"
  }

  tags {
    version = "1.0.0"
  }

  locality {
    strict = true
    far_from = ["${triton_machine.base0.id}"]
  }

}

/cc @sean-

Copy link
Contributor

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what's left from your perspective, @cheapRoc but this looks reasonably close.

Description: "UUIDs of other instances to provision alongside",
Optional: true,
Type: schema.TypeList,
Elem: &schema.Schema{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Terraform provider style, it's okay to collapse this onto a single line.

"far_from": {
Description: "UUIDs of other instances to not provision alongside",
Optional: true,
Type: schema.TypeList,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence as to whether or not this should be ForceNew: true or not. If I change this value, I'm expecting to be far_from whatever I included. Same applies to close_to, too. If it moves, you want to detect this drift. The hint isn't a guarantee, but... with strict = true, I could see this being important. I also don't want to get hung up on this, either.

At the very least, food for thought and something to discuss with interested parties.

Copy link
Contributor Author

@jwreagor jwreagor Aug 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Actually, since you've brought it up, I think ForceNew is reasonable here. Locality information is never returned to the consumer (besides parent compute_node UUID) so Terraform never sees a difference. If the consumer changes this information, causing the difference themselves, they're defining a new contract with the provisioned instance, thus forcing a re-provision given new locality hints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm second guessing myself given your notes in chat. It does sound like the dependency graph could potentially cascade into an entire re-cluster given cross dependent locality configuration.

My comments about compute_node UUID were to mention that this is the only data we receive about the instituted state of locality after an instance is created. The compute_node value returned by CloudAPI is the only "tell" of where the instance ends up hosted. There might be some sort of placement resolution behavior or comparison we could draw using this value (suppress locality difference if CN != blah blah blah), but that sounds awful hack-ey and probably won't work.

Given that this feature isn't replacing cluster scheduling as a whole, and should only exist to assist initial HA "hints", I think we're cool to back off the ForceNew idea for the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those following along at home, here's my nightmare scenario:

resource "triton_machine" "consul1" {
  name = "consul01"
  # snip
}

resource "triton_machine" "consul2" {
  name = "consul01"
  # snip
  locality {
    strict = true
    far_from = ["${triton_machine.consul1.id}"]
  }
}

resource "triton_machine" "consul3" {
  name = "consul01"
  # snip
  locality {
    strict = true
    far_from = ["${triton_machine.consul1.id}", "${triton_machine.consul2.id}"]
  }
}

Now suppose consul1 dies. And then someone does a terraform plan && terraform apply. waves goodby to Consul cluster

On one hand, it's possible consul1 could be replaced on either consul2 or consul3's CNs (or in the same rack). On another hand, maybe this isn't so bad because at least it will tell you something is wrong AND you need to do a blue/green flip in order to deploy three new nodes that are all spaced apart correctly.

The correct behavior is hard to achieve without performing surgery on the DAG somehow. Ideally there would be some late-binding ID that could be used in the future, or some kind of infrastructure promise that could be acquired before the machines are created. Something like:

resource "triton_machine_contract" "consul_servers" {
  rule = "far_from"
}

resource "triton_machine" "consul1" {
  name = "consul01"
  # snip
  locality {
    strict = true
    placement_contract = "${triton_machine_contract.consul_servers.id}"
  }
}

resource "triton_machine" "consul2" {
  name = "consul01"
  # snip
  locality {
    strict = true
    placement_contract = "${triton_machine_contract.consul_servers.id}"
  }
}

resource "triton_machine" "consul3" {
  name = "consul01"
  # snip
  locality {
    strict = true
    placement_contract = "${triton_machine_contract.consul_servers.id}"
  }
}

because then the contract ID wouldn't change and nodes could die off per entropy and be replaced without risking consul[1-3] being possibly wiped from the face of the earth. Until that time, I'm okay with changing ForceNew: true if strict = true. If strict = false, then ForceNew: false. I also don't want to block on this trying to get it perfect.

CC @jen20 for consideration

if strictRaw, found := d.GetOk("locality.0.strict"); found {
createInput.LocalityStrict = strictRaw.(bool)
}
if nearRaw, found := d.GetOk("locality.0.close_to"); found {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a newline between each of the if statements here and below.

@jwreagor
Copy link
Contributor Author

I'm not sure what's left from your perspective

The behavior is definitely "there". The only thing missing was acceptance tests and a couple more usability tests.

@jwreagor jwreagor changed the title triton_machine: Introduce new Locality stanza ~ WIP triton_machine: Introduce new Locality stanza Aug 16, 2017
@jwreagor
Copy link
Contributor Author

@sean- Everything is squashed down, ready for a merge.

@jwreagor
Copy link
Contributor Author

To distill the discussion above, this does not contain any of the ForceNew behavior we discussed above.

@jwreagor
Copy link
Contributor Author

I've squashed/updated the branch and included changes to docs and code that remove strict entirely.

…ffinity rules

* Add locality stanza to triton_machine schema
* Documentation for new locality stanza
* Acceptance tests for new locality stanza

This feature does not include the `strict` argument for locality hints within
Triton. This behavior returns errors if locality hints cannot be applied,
leading to issues with some architectures using Terraform. A future update to
the provider might find a more beneficial manner in which to utilize this but
it's not included today.
Copy link
Contributor

@sean- sean- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sean- sean- merged commit 00d45fe into TritonDataCenter:master Aug 16, 2017
sean- added a commit that referenced this pull request Aug 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants