2 worker pools failing to create cluster in AWS #839

knrt10 · 2020-08-24T13:47:22Z

Use the following config for worker pool

worker_pool "pool-1" {
    count         = var.workers_count
    instance_type = var.workers_type
    ssh_pubkeys   = var.ssh_public_keys
    labels        = "foo=bar"
    taints	  = "nodeType=storage:NoSchedule"
  }

  worker_pool "pool-2" {
    count         = var.workers_count
    instance_type = var.workers_type
    ssh_pubkeys   = var.ssh_public_keys
    labels        = "lolum=ipsum"
    taints	  = "nodeType=storage:NoSchedule"
  }

Bootkube starts but then it fails with following error.

invidian · 2020-08-24T13:54:51Z

CC @johananl

knrt10 · 2020-08-24T14:05:07Z

For more context pr-aws

johananl · 2020-08-24T14:18:22Z

If we want to route HTTP traffic to multiple worker pools, I think we have the following options:

Use one NLB per pool.
Use a single NLB with multiple listeners (which means some pools won't be able to use ports 80 and 443).
Use an ALB and do HTTP-based routing.

I think this is more of a design question than a bug. We should think about the functionality we want to have and then build things accordingly.

johananl · 2020-08-24T14:19:39Z

BTW, the failure isn't related to Bootkube. bootkube-start actually completed successfully in the output above.

knrt10 · 2020-08-24T14:21:06Z

BTW, the failure isn't related to Bootkube. bootkube-start actually completed successfully in the output above

yes, so if we want users to work with the present config. We should atleast document it. But what if user creates 3 pools. Then they have to set lb_http_port and lb_https_port differently for each pool

invidian · 2020-08-27T11:21:34Z

If we want to route HTTP traffic to multiple worker pools, I think we have the following options:

We can also use the CCM to set up the load balancing for us.

I think this is more of a design question than a bug.

The bug is, that we do not validate if the 2 pools have colliding ports and we do not document this (implied) requirement.

invidian · 2020-08-27T11:38:58Z

From a quick glance at the Terraform code, I have a feeling, that with current implementation, it would make sense to move the aws_lb_listener resources into the controller module and collect there the aws_lb_target_group.workers_http.arn from all worker pools by default into forward action block. This way, by default you get a LB on ports 80 and 443 and all worker pools participate in it.

Then, if lb_http_port or lb_https_port is set for the worker pool, create separate listeners for this worker pool and don't export the worker pool into the 80/443 listeners.

This way, we fix the colliding 2 worker pools and don't modify existing behavior.

Alternatively, we could just reject such configuration for now and document the limitation. We should also mention, that if only one worker pool handles the ingress traffic, node selector and labels should be used on this pool and contour component, so contour (or envoy?) pods only run on this worker pool.

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

johananl · 2020-08-31T11:16:04Z

From a quick glance at the Terraform code, I have a feeling, that with current implementation, it would make sense to move the aws_lb_listener resources into the controller module and collect there the aws_lb_target_group.workers_http.arn from all worker pools by default into forward action block.

I think it makes more sense to expose the load balancer's ID as an output in the controllers module and have each worker pool create its own target group and attach itself to the LB. My rationale:

Code that's related to a specific worker pool should reside in the workers module. This way we don't have to re-apply the controllers module when modifying/removing a worker pool.
Making the controllers module depend on the workers module can be a source for trouble given that the workers module depends on the controllers module.

johananl · 2020-08-31T11:18:56Z

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

One advantage of the current setup is that it allows the user to utilize a single NLB for both the API server and user applications. I like relying on CCM for LBs in general, however we would still need an LB for the API server.

invidian · 2020-08-31T11:27:40Z

Thanks for tipping in @johananl. I think it make sense what you're saying. @knrt10 would you like to look into it so maybe we could ship it with v0.4.0?

knrt10 · 2020-08-31T11:50:32Z

Sure, I can take a look into this

johananl · 2020-09-01T11:44:41Z

Following an offline discussion with @knrt10, here is what we're proposing:

We deploy an NLB for the API server (let's call it the "API NLB" here). By default, we don't provision LB resources for worker pools.

If they so choose, the user can enable LB provisioning at the worker pool level:

cluster "foo" {
  ...
  worker_pool "one" {
    # Provision LB resources for this pool.
    load_balancing {
      mapping {
        protocol = "TCP"
        listener_port = 80
        target_port = 30080
      }

      mapping {
        protocol = "TCP"
        listener_port = 443
        target_port = 30443
      }
    }
  }

  worker_pool "two" {
    # Provision LB resources for this pool.
    load_balancing {
      mapping {
        protocol = "TCP"
        listener_port = 7777
        target_port = 8888
      }

      mapping {
        protocol = "UDP"
        listener_port = 12001
        target_port = 12001
      }
    }
  }

  worker_pool "three" {
    # Don't provision LB resources for this pool.
    ...
  }
}

When the load_balancing block is specified, we do the following:

Create a listener (of type TCP!) on the API NLB for every mapping block using the specified listener_port.
Create a target group for every mapping block using the specified target_port.
Create routing rules on the API NLB to "wire" the listeners to the correct target groups.

In the docs we instruct the user to create a NodePort service with the same node ports as specified in the target port knobs.

When the load_balancing block isn't specified, we don't create any LB resources on behalf of the user.

In addition to the above, the user is free to deploy services of type LoadBalancer (optionally with annotations to control the type and config of the created LB) and we ensure a CCM is present to provision load balancers on behalf of the user.

Caveats which should be documented:

The Lokomotive-provisioned LB only supports TCP and UDP routing. If the user wants to terminate TLS on the AWS LB they have to manage the LB themselves.
All listener_port values must be unique.

Any feedback is welcome.

johananl · 2020-09-01T13:28:53Z

@rata @surajssd - I'd love to know what you guys think about the proposal in my previous comment.

invidian · 2020-09-01T14:03:00Z

@johananl thanks for the proposal. I think it make sense, though it is very similar to what we have right now. But it seems to be that implementation might be too complex to include that in v0.4.0.

Maybe we just implement a validation rule to reject conflicting ports right now (including the defaults) and we schedule that for later?

johananl · 2020-09-01T14:09:57Z

I have no opinion regarding which release to put this in. I'm fine with either rejecting the "bad" config for now or even just documenting the constraints.

knrt10 · 2020-09-01T14:11:49Z

I think if we want to do this, we should do this correctly and for 0.4.0, we should just document the constraints for using different lb_http_port and lb_https_port. If everyone agrees, I can create a PR for the same.

invidian · 2020-09-01T14:14:04Z

I'd implement a validation rule for it. It's not difficult to do so and will prevent confusing behavior (users tend not to read the documentation 😄)

surajssd · 2020-09-02T06:50:56Z

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

I like this idea. I don't think there is harm in having two LBs for HA reasons. If we really want to use one LB for both app traffic and control plane traffic, do we have knobs in CCM where it can do the magic for us?

johananl · 2020-09-02T11:36:34Z

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

I like this idea. I don't think there is harm in having two LBs for HA reasons. If we really want to use one LB for both app traffic and control plane traffic, do we have knobs in CCM where it can do the magic for us?

I don't think you can modify an existing LB using the CCM. AFAIK the AWS CCM either owns an LB or doesn't. AFAICT if we want to allow using a single NLB for both the API server and application traffic, the CCM can't help there.

rata · 2020-09-02T11:36:39Z

I'm just coming back from holidays, but I think:

Creating load balancers for apps, IMHO, is out of scope for lokomotive. The only valid reason, IMHO, is to create them when configuring the ingress lokomotive component. For the ingress component, though, It can be via the CCM too (is that already merged and working?). I think that is a valid solution (not the only one).

However, I might be missing some context. In particular, why options like this were discarded. I can see some problems and some work-arounds to such approach:

Move the resource "aws_lb_listener" that are here to the controller module.
The workers just add/remove target groups attachments to the NLB when a worker pool is created/deleted using an output from the controller module
I guess one complication would be that the listener default_action is required (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb_listener) in the controller module, before any target group is created. We can do something like use the type "fixed-response" and return something like "no target group configured/healthy". IIUC, that path only will be taken if no target groups are found and healthy

Wouldn't something like this work and be simpler than the following proposals? Is there a downside this have that I'm not aware of?

After such option was discarded, the next proposa seems to have several downside IMHO:

Creates the mapping block, which is just confusing and probably something only relevant for AWS that we can avoid. Furthermore, the user needs to go too much into the details, when it just wants a load balancer that forwards traffic. There is even no need for the user to specify the mappings with "random" ports and sync with all the mappings. That is a complexity that can be avoided IIUC
We need to have a step, as the comment says: In the docs we instruct the user to create a NodePort service with the same node ports as specified in the target port knobs.. This seems very fragile UI and very bad UX, IMHO. Specially if we can avoid it.
Is there a reasonable way to achieve what is described in the comments without breaking existing users? It is not crystal clear to me.

But, IIUC, there is a simpler way to do this if the CCM provision load balancer for us: just use service type load balancer with the annotations documented in contour docs. IOW, do what is explained here: https://projectcontour.io/guides/deploy-aws-nlb/.

IMHO, I don't know why the first option was discarded, but seems simpler and have a better UX, IMHO. The second option (rely on CCM) if that is working, seems to be the best way (and if CCM is not yet working for this, might be the best way when that is working).

In any case, if either of the two options I propose was discarded for good reasons (I really lack the information to know :-D) or there is no consensus in the short term, I'm totally fine having a validation rule for now that will the original bug report not fail at the terraform layer, but give a clear error message in lokoctl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2 worker pools failing to create cluster in AWS #839

2 worker pools failing to create cluster in AWS #839

knrt10 commented Aug 24, 2020

invidian commented Aug 24, 2020

knrt10 commented Aug 24, 2020

johananl commented Aug 24, 2020 •

edited

Loading

johananl commented Aug 24, 2020

knrt10 commented Aug 24, 2020

invidian commented Aug 27, 2020

invidian commented Aug 27, 2020

johananl commented Aug 31, 2020

johananl commented Aug 31, 2020

invidian commented Aug 31, 2020

knrt10 commented Aug 31, 2020

johananl commented Sep 1, 2020 •

edited

Loading

johananl commented Sep 1, 2020

invidian commented Sep 1, 2020

johananl commented Sep 1, 2020

knrt10 commented Sep 1, 2020 •

edited

Loading

invidian commented Sep 1, 2020

surajssd commented Sep 2, 2020

johananl commented Sep 2, 2020

rata commented Sep 2, 2020 •

edited

Loading

johananl commented Sep 2, 2020

invidian commented Nov 19, 2020

invidian commented Dec 7, 2020

2 worker pools failing to create cluster in AWS #839

2 worker pools failing to create cluster in AWS #839

Comments

knrt10 commented Aug 24, 2020

invidian commented Aug 24, 2020

knrt10 commented Aug 24, 2020

johananl commented Aug 24, 2020 • edited Loading

johananl commented Aug 24, 2020

knrt10 commented Aug 24, 2020

invidian commented Aug 27, 2020

invidian commented Aug 27, 2020

johananl commented Aug 31, 2020

johananl commented Aug 31, 2020

invidian commented Aug 31, 2020

knrt10 commented Aug 31, 2020

johananl commented Sep 1, 2020 • edited Loading

johananl commented Sep 1, 2020

invidian commented Sep 1, 2020

johananl commented Sep 1, 2020

knrt10 commented Sep 1, 2020 • edited Loading

invidian commented Sep 1, 2020

surajssd commented Sep 2, 2020

johananl commented Sep 2, 2020

rata commented Sep 2, 2020 • edited Loading

johananl commented Sep 2, 2020

invidian commented Nov 19, 2020

invidian commented Dec 7, 2020

johananl commented Aug 24, 2020 •

edited

Loading

johananl commented Sep 1, 2020 •

edited

Loading

knrt10 commented Sep 1, 2020 •

edited

Loading

rata commented Sep 2, 2020 •

edited

Loading