Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

triton_machine: Introduce new affinity rules feature #42

Merged
merged 2 commits into from
Oct 4, 2017

Conversation

jwreagor
Copy link
Contributor

@jwreagor jwreagor commented Sep 27, 2017

Ref: #32 (comment) as well as internal ticket PUBAPI-1428.

Triton team is deprecating locality in favor of the affinity rules feature ported from Docker API into CloudAPI's CreateMachine endpoint. This is still incoming but we're prepping for the release by integrating with the provider (coming to a CloudAPI near you).

We've introduced affinity argument to triton_machine like so...

resource "triton_machine" "db" {
    count = 2

    name = "${format("db-%02d", count.index + 1)}"
    package = "sample-128M"
    image = "${data.triton_image.base.id}"

    affinity = ["role!=database"]

    tags {
        role = "database"
    }

}

The syntax used for rule definition are publicly documented here.

In order to achieve the desired behavior throughout a DC we've serialized resource creation when affinity rules are present (ht @sean-). If affinity is defined on a set of triton_machine resources than each resource will be created one after the other. If a resource has a count > 1 you'll see each resource group created in a serial fashion as well.

Here's a configuration that demonstrates some of the interplay between resource definitions and various options for affinity rules.

provider "triton" {}

data "triton_image" "base" {
    name = "base-64-lts"
    most_recent = true
}

resource "triton_machine" "db" {
    count = 2

    name = "${format("db-%02d", count.index + 1)}"
    package = "sample-128M"
    image = "${data.triton_image.base.id}"
    networks = ["${data.triton_network.public.id}"]

    affinity = ["role!=database"]

    tags {
        role = "database"
    }

}

resource "triton_machine" "web" {
    depends_on = ["triton_machine.db"]

    count = 3

    name = "${format("web-%02d", count.index + 1)}"
    package = "sample-128M"
    image = "${data.triton_image.base.id}"
    networks = ["${data.triton_network.public.id}"]

    affinity = ["instance==~db*", "role!=~web"]

    tags {
        role = "web"
    }
}

resource "triton_machine" "sidecar" {
    count = 3

    name = "${format("sidecar-%02d", count.index + 1)}"
    package = "sample-128M"
    image = "${data.triton_image.base.id}"
    networks = ["${data.triton_network.public.id}"]

    tags {
        role = "sidecar"
    }
}

In the example above we want the following placement...

  • We want db instances to never be placed next to other db ("hard" affinity, !=).
  • If possible, we want web placed next to db ("soft" affinity, ==~) but never on a CN alongside another web instance ("soft" !=~).
  • web has a depends_on to ensure it is explicitly dependent upon db being created first.
  • We have a sidecar process that can provision anywhere (no affinity).

Given this setup and the serial nature of provisioning with affinity, both sidecar and db resources will be created first, followed by web after. The following is a list of compute nodes that demonstrate the finalized placement (ordered by creation and CN).

sidecar-03  0fe605b7-f781-48e8-9056-15f037910d2d
sidecar-02  0fe605b7-f781-48e8-9056-15f037910d2d
db-01       0fe605b7-f781-48e8-9056-15f037910d2d
web-02      0fe605b7-f781-48e8-9056-15f037910d2d

db-02       3ffe48b6-ebd7-4c08-b355-ea2afbe85a37
web-03      3ffe48b6-ebd7-4c08-b355-ea2afbe85a37

sidecar-01  b928db39-b60b-44a7-89c6-c30baf3adaac
web-01      b928db39-b60b-44a7-89c6-c30baf3adaac

ATM I'm still fine tuning documentation, linking to the rule syntax, and acceptance tests but wanted to get this out early.

* Enforce serialized resource creation given affinity rules
* Update website documentation
@jwreagor
Copy link
Contributor Author

I've updated/verified what documentation and tests I already had. Tests are passing on my test host. Looking for more feedback around documentation.

Also, we'll probably want to wait for this functionality to hit JPC, no?

/cc @sean- @jen20 @stack72

@jwreagor
Copy link
Contributor Author

More here... https://apidocs.joyent.com/cloudapi/#830

@sean-
Copy link
Contributor

sean- commented Sep 28, 2017

I’m going to submit a follow up PR to move us off of govendor and over to dep(1).

}

if len(affinity) > 0 {
client.affinityLock.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following will require some exploration, but it would be good to know when the metadata re: placement is present in Triton. If we put a stutter in here that is sufficiently long so that we pace the rate at which we execute CreateMachine calls, we could actually allow for parallelism while side-stepping the raciness of affinity within Triton.

Being explicit, if we find that the metadata for the evaluation of affinity is present within Triton within 1s, then we could stagger the CreateMachine calls to be paced at a rate of one call per second. This would allow the provider to have concurrent CreateMachine calls in-flight.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: this is a suggestion for an optimization in a follow up PR.

* `locality` - (map of Locality hints, Optional)
A mapping of [Locality](https://apidocs.joyent.com/cloudapi/#CreateMachine) attributes to apply to the machine that assist in datacenter placement. NOTE: Locality hints are only used at the time of machine creation and not referenced after.
A mapping of [Locality](https://apidocs.joyent.com/cloudapi/#CreateMachine) attributes to apply to the machine that assist in data center placement. NOTE: Locality hints are only used at the time of machine creation and not referenced after.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, are we a data center company or a `data center company? Which ever we are, let’s be consistent with the rest of our docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an effort to be more consistent in the doc since we were already using "data center". I chose to run with the space.

@jwreagor
Copy link
Contributor Author

jwreagor commented Sep 29, 2017

This can't be merged just yet because CloudAPI is only updated on us-east-3b.

I'm working on how to handle CloudAPI endpoints that do not support affinity, whether we need some sort of version detection in triton-go, and/or moving HashiCorp's nightly testing rig to target to us-east-3b so our tests at least pass.

@jwreagor
Copy link
Contributor Author

jwreagor commented Oct 4, 2017

It sounds like CloudAPI was updated across JPC with dependent functionality for this feature (much faster than I anticipated). I confirmed using the following command which shows cloudapi/8.3.0 across all regions.

$ triton profile ls -o name | grep 'us-' | xargs -I % triton -p \% cloudapi /--ping -i 2>&1 | grep server | uniq
server: cloudapi/8.3.0

I'll test again tomorrow and get this PR merged.

@jwreagor jwreagor merged commit b991ded into TritonDataCenter:master Oct 4, 2017
@jwreagor
Copy link
Contributor Author

jwreagor commented Oct 4, 2017

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants