Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

2 worker pools failing to create cluster in AWS #839

Closed
knrt10 opened this issue Aug 24, 2020 · 23 comments
Closed

2 worker pools failing to create cluster in AWS #839

knrt10 opened this issue Aug 24, 2020 · 23 comments
Assignees
Labels
bug Something isn't working platform/aws AWS-related priority/P0 Urgent priority

Comments

@knrt10
Copy link
Member

knrt10 commented Aug 24, 2020

Use the following config for worker pool

worker_pool "pool-1" {
    count         = var.workers_count
    instance_type = var.workers_type
    ssh_pubkeys   = var.ssh_public_keys
    labels        = "foo=bar"
    taints	  = "nodeType=storage:NoSchedule"
  }

  worker_pool "pool-2" {
    count         = var.workers_count
    instance_type = var.workers_type
    ssh_pubkeys   = var.ssh_public_keys
    labels        = "lolum=ipsum"
    taints	  = "nodeType=storage:NoSchedule"
  }

Bootkube starts but then it fails with following error.

Screenshot 2020-08-24 at 7 13 02 PM

@knrt10 knrt10 added bug Something isn't working platform/aws AWS-related labels Aug 24, 2020
@invidian
Copy link
Member

CC @johananl

@knrt10 knrt10 added the priority/P0 Urgent priority label Aug 24, 2020
@knrt10
Copy link
Member Author

knrt10 commented Aug 24, 2020

For more context pr-aws

@invidian invidian added this to the v0.4.0 milestone Aug 24, 2020
@johananl
Copy link
Member

johananl commented Aug 24, 2020

If we want to route HTTP traffic to multiple worker pools, I think we have the following options:

  • Use one NLB per pool.
  • Use a single NLB with multiple listeners (which means some pools won't be able to use ports 80 and 443).
  • Use an ALB and do HTTP-based routing.

I think this is more of a design question than a bug. We should think about the functionality we want to have and then build things accordingly.

@johananl
Copy link
Member

BTW, the failure isn't related to Bootkube. bootkube-start actually completed successfully in the output above.

@knrt10
Copy link
Member Author

knrt10 commented Aug 24, 2020

BTW, the failure isn't related to Bootkube. bootkube-start actually completed successfully in the output above

yes, so if we want users to work with the present config. We should atleast document it. But what if user creates 3 pools. Then they have to set lb_http_port and lb_https_port differently for each pool

@invidian
Copy link
Member

If we want to route HTTP traffic to multiple worker pools, I think we have the following options:

We can also use the CCM to set up the load balancing for us.

I think this is more of a design question than a bug.

The bug is, that we do not validate if the 2 pools have colliding ports and we do not document this (implied) requirement.

@invidian
Copy link
Member

From a quick glance at the Terraform code, I have a feeling, that with current implementation, it would make sense to move the aws_lb_listener resources into the controller module and collect there the aws_lb_target_group.workers_http.arn from all worker pools by default into forward action block. This way, by default you get a LB on ports 80 and 443 and all worker pools participate in it.

Then, if lb_http_port or lb_https_port is set for the worker pool, create separate listeners for this worker pool and don't export the worker pool into the 80/443 listeners.

This way, we fix the colliding 2 worker pools and don't modify existing behavior.

Alternatively, we could just reject such configuration for now and document the limitation. We should also mention, that if only one worker pool handles the ingress traffic, node selector and labels should be used on this pool and contour component, so contour (or envoy?) pods only run on this worker pool.

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

@johananl
Copy link
Member

From a quick glance at the Terraform code, I have a feeling, that with current implementation, it would make sense to move the aws_lb_listener resources into the controller module and collect there the aws_lb_target_group.workers_http.arn from all worker pools by default into forward action block.

I think it makes more sense to expose the load balancer's ID as an output in the controllers module and have each worker pool create its own target group and attach itself to the LB. My rationale:

  • Code that's related to a specific worker pool should reside in the workers module. This way we don't have to re-apply the controllers module when modifying/removing a worker pool.
  • Making the controllers module depend on the workers module can be a source for trouble given that the workers module depends on the controllers module.

@johananl
Copy link
Member

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

One advantage of the current setup is that it allows the user to utilize a single NLB for both the API server and user applications. I like relying on CCM for LBs in general, however we would still need an LB for the API server.

@invidian
Copy link
Member

Thanks for tipping in @johananl. I think it make sense what you're saying. @knrt10 would you like to look into it so maybe we could ship it with v0.4.0?

@knrt10
Copy link
Member Author

knrt10 commented Aug 31, 2020

Sure, I can take a look into this

@johananl
Copy link
Member

johananl commented Sep 1, 2020

Following an offline discussion with @knrt10, here is what we're proposing:

We deploy an NLB for the API server (let's call it the "API NLB" here). By default, we don't provision LB resources for worker pools.

If they so choose, the user can enable LB provisioning at the worker pool level:

cluster "foo" {
  ...
  worker_pool "one" {
    # Provision LB resources for this pool.
    load_balancing {
      mapping {
        protocol = "TCP"
        listener_port = 80
        target_port = 30080
      }

      mapping {
        protocol = "TCP"
        listener_port = 443
        target_port = 30443
      }
    }
  }

  worker_pool "two" {
    # Provision LB resources for this pool.
    load_balancing {
      mapping {
        protocol = "TCP"
        listener_port = 7777
        target_port = 8888
      }

      mapping {
        protocol = "UDP"
        listener_port = 12001
        target_port = 12001
      }
    }
  }

  worker_pool "three" {
    # Don't provision LB resources for this pool.
    ...
  }
}

When the load_balancing block is specified, we do the following:

  • Create a listener (of type TCP!) on the API NLB for every mapping block using the specified listener_port.
  • Create a target group for every mapping block using the specified target_port.
  • Create routing rules on the API NLB to "wire" the listeners to the correct target groups.

In the docs we instruct the user to create a NodePort service with the same node ports as specified in the target port knobs.

When the load_balancing block isn't specified, we don't create any LB resources on behalf of the user.

In addition to the above, the user is free to deploy services of type LoadBalancer (optionally with annotations to control the type and config of the created LB) and we ensure a CCM is present to provision load balancers on behalf of the user.

Caveats which should be documented:

  • The Lokomotive-provisioned LB only supports TCP and UDP routing. If the user wants to terminate TLS on the AWS LB they have to manage the LB themselves.
  • All listener_port values must be unique.

Any feedback is welcome.

@johananl
Copy link
Member

johananl commented Sep 1, 2020

@rata @surajssd - I'd love to know what you guys think about the proposal in my previous comment.

@invidian
Copy link
Member

invidian commented Sep 1, 2020

@johananl thanks for the proposal. I think it make sense, though it is very similar to what we have right now. But it seems to be that implementation might be too complex to include that in v0.4.0.

Maybe we just implement a validation rule to reject conflicting ports right now (including the defaults) and we schedule that for later?

@johananl
Copy link
Member

johananl commented Sep 1, 2020

I have no opinion regarding which release to put this in. I'm fine with either rejecting the "bad" config for now or even just documenting the constraints.

@knrt10
Copy link
Member Author

knrt10 commented Sep 1, 2020

I think if we want to do this, we should do this correctly and for 0.4.0, we should just document the constraints for using different lb_http_port and lb_https_port. If everyone agrees, I can create a PR for the same.

@invidian
Copy link
Member

invidian commented Sep 1, 2020

I'd implement a validation rule for it. It's not difficult to do so and will prevent confusing behavior (users tend not to read the documentation 😄)

@surajssd
Copy link
Member

surajssd commented Sep 2, 2020

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

I like this idea. I don't think there is harm in having two LBs for HA reasons. If we really want to use one LB for both app traffic and control plane traffic, do we have knobs in CCM where it can do the magic for us?

@johananl
Copy link
Member

johananl commented Sep 2, 2020

Finally, I think ideally we should be just using the CCM to set up the Load Balancers for us and remove this all code from Terraform, as IMO it doesn't really "fit" into Kubernetes setup.

I like this idea. I don't think there is harm in having two LBs for HA reasons. If we really want to use one LB for both app traffic and control plane traffic, do we have knobs in CCM where it can do the magic for us?

I don't think you can modify an existing LB using the CCM. AFAIK the AWS CCM either owns an LB or doesn't. AFAICT if we want to allow using a single NLB for both the API server and application traffic, the CCM can't help there.

@rata
Copy link
Member

rata commented Sep 2, 2020

I'm just coming back from holidays, but I think:

  • Creating load balancers for apps, IMHO, is out of scope for lokomotive. The only valid reason, IMHO, is to create them when configuring the ingress lokomotive component. For the ingress component, though, It can be via the CCM too (is that already merged and working?). I think that is a valid solution (not the only one).

However, I might be missing some context. In particular, why options like this were discarded. I can see some problems and some work-arounds to such approach:

  • Move the resource "aws_lb_listener" that are here to the controller module.
  • The workers just add/remove target groups attachments to the NLB when a worker pool is created/deleted using an output from the controller module
  • I guess one complication would be that the listener default_action is required (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lb_listener) in the controller module, before any target group is created. We can do something like use the type "fixed-response" and return something like "no target group configured/healthy". IIUC, that path only will be taken if no target groups are found and healthy

Wouldn't something like this work and be simpler than the following proposals? Is there a downside this have that I'm not aware of?

After such option was discarded, the next proposa seems to have several downside IMHO:

  • Creates the mapping block, which is just confusing and probably something only relevant for AWS that we can avoid. Furthermore, the user needs to go too much into the details, when it just wants a load balancer that forwards traffic. There is even no need for the user to specify the mappings with "random" ports and sync with all the mappings. That is a complexity that can be avoided IIUC
  • We need to have a step, as the comment says: In the docs we instruct the user to create a NodePort service with the same node ports as specified in the target port knobs.. This seems very fragile UI and very bad UX, IMHO. Specially if we can avoid it.
  • Is there a reasonable way to achieve what is described in the comments without breaking existing users? It is not crystal clear to me.

But, IIUC, there is a simpler way to do this if the CCM provision load balancer for us: just use service type load balancer with the annotations documented in contour docs. IOW, do what is explained here: https://projectcontour.io/guides/deploy-aws-nlb/.

IMHO, I don't know why the first option was discarded, but seems simpler and have a better UX, IMHO. The second option (rely on CCM) if that is working, seems to be the best way (and if CCM is not yet working for this, might be the best way when that is working).

In any case, if either of the two options I propose was discarded for good reasons (I really lack the information to know :-D) or there is no consensus in the short term, I'm totally fine having a validation rule for now that will the original bug report not fail at the terraform layer, but give a clear error message in lokoctl.

Other questions due to lack of context:

  • The user should be able to access the API servers. Not sure why the at some comments of the issue there are discussions about a NLB for the API server. AFAIK (maybe it changed in the last days I was away? :)) we are not using a NLB, just DNS records (with 3 IPs if there are 3 controllers, etc.). Why do we need/want a load balancer for the API server? I'm lacking some context regarding why was this even mentioned. I think we are fine without a NLB for the API server. Am I missing something?

Thanks!

@knrt10 knrt10 removed this from the v0.4.0 milestone Sep 2, 2020
@johananl
Copy link
Member

johananl commented Sep 2, 2020

The user should be able to access the API servers. Not sure why the at some comments of the issue there are discussions about a NLB for the API server. AFAIK (maybe it changed in the last days I was away? :)) we are not using a NLB, just DNS records (with 3 IPs if there are 3 controllers, etc.). Why do we need/want a load balancer for the API server? I'm lacking some context regarding why was this even mentioned. I think we are fine without a NLB for the API server. Am I missing something?

@rata we are using an NLB for the API server on AWS:


This is Typhoon heritage.

@invidian
Copy link
Member

Given that #889 is merged and we no longer fail to create 2 worker pools, I would like to close this issue.

However, I think information about LBs on AWS here are valuable. @johananl @surajssd @knrt10 could one of you capture the essence of the discussion here and create a new enhancement issue to track this?

@invidian
Copy link
Member

invidian commented Dec 7, 2020

I'll close this then and we continue the discussion in #1248.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working platform/aws AWS-related priority/P0 Urgent priority
Projects
None yet
Development

No branches or pull requests

5 participants