Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Self Service TLS Certificate Approaches #763

Closed
robscott opened this issue Aug 11, 2021 · 10 comments · Fixed by #2721
Closed

Document Self Service TLS Certificate Approaches #763

robscott opened this issue Aug 11, 2021 · 10 comments · Fixed by #2721
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@robscott
Copy link
Member

What would you like to be added:
As a follow up to the discussion in #749 (comment), it would be good to clarify exactly how the API can be used to allow app owners to provide their own TLS certs.

Why this is needed:
This is a common request, and the existing guidance in GEP #746 is pretty broad and may not scale as much as needed.

@robscott
Copy link
Member Author

Following up from a thread with @tsaarni and @youngnick here.

This is something that we discussed in more detail in community meetings as well as #749 (comment). Although we are interested in providing a self service approach here, we believed that the method provided in v1alpha1 was not the right one, and introduced far too many potential conflicts and points of confusion.

What we have added in v1alpha2 is the ability for Gateways to reference TLS certs in different namespaces. This is important as it does allow for an app developer to provide a cert, just with some level of coordination with the Gateway admin. I think the best self-service story involves taking this one step further and building some kind of controller/automation around attaching certs from different namespaces to a Gateway. After all, that's all that the cert ref on HTTPRoute did - it was a shortcut for attaching a cert ref to a listener. That shortcut was likely not the best one, but a controller could likely provide a better experience here as far as automating this attachment.

@tsaarni
Copy link

tsaarni commented Sep 8, 2021

For ease of access for the above mentioned thread between myself and @youngnick , I'm copying it from that linked slow-loading-PR to here as well

tsaarni wrote:
I asked following on slack but wanted to add here too: Moving TLS certificate configuration from HTTPRoute to Gateway seems to disallow the self-service use case, where application developer manages the TLS certificates. This use case that existed in v1alpha1, and exists also in Ingress. It would imply that defining a secured virtualhost becomes a split responsibility between application developer and cluster operator.

Over at slack, we discussed that there could be a controller that automatically constructs listeners with TLS secret references from application developer, but I guess the self-service use case would not be part of the Gateway API then anymore?

youngnick wrote:
At the end of the day, including TLS details in HTTPRoute was always just a shortcut way of bringing the TLS config to the Listener anyway, and it introduced a lot of complexity into every implementation of the API. We discussed this on the community meeting and agreed that it's better to separate the complexity of ensuring that the right secrets are bound to the right hostnames on the right ports into a separate bundle of code (the controller you mentioned).

This means that yes, there will not be full support for the self-service use case in the base API. But we really think that doing it with a separate controller allows for:

  • keeping complexity separate and only for people who need it
  • the security concerns about dynamically mapping certificates to a host can be managed in that controller as well (and there are quite a few).
    Does this mean that it is not and will never be part of the API? No. If there's enough demand for this, could the dynamic controller be developed here? Yes. Do I think that's likely? No.

My personal opinion (not that of the project) is that the self-service use case is too magic, and a problem waiting to happen, just like it is in Ingress.

tsaarni wrote:
Thanks youngnick, I can understand it is cleaner like this and it matches the reality of implementations. But operational-wise, I would have expected the self-service use case to be a common one for multi-tenant clusters and imagined that involving cluster-admin (to secure virtualhost provisioning) could be a problem. I'd be curious to know how the certificate provisioning would be handled in e.g. OpenShift if it were to adopt Gateway API for multi-tenant scenario, maybe danehans has insights?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2021
@hbagdi
Copy link
Contributor

hbagdi commented Dec 8, 2021

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 8, 2021
@shaneutt
Copy link
Member

shaneutt commented Aug 5, 2022

Despite this issue being quite old, we the maintainers are still pretty convinced that we want to have this functionality in a future release. We are marking this help wanted as we're looking for contributors with strong use cases to help champion and drive this forward.

@shaneutt shaneutt added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 5, 2022
@mikemorris
Copy link
Contributor

#1430 may provide a path forward for implementing this functionality.

@shaneutt shaneutt added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 8, 2023
@shaneutt shaneutt modified the milestone: v1.0.0 Mar 8, 2023
@shaneutt shaneutt added triage/needs-information Indicates an issue needs more information in order to work on it. and removed help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. priority/backlog Higher priority than priority/awaiting-more-evidence. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. labels Mar 16, 2023
@shaneutt shaneutt added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 5, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2023
@robscott
Copy link
Member Author

This topic has come up again more recently, and I think we need to revisit it. As we've shown in #2668, the API validation actually accidentally to achieve this goal. We're working to revert that strict validation, but without clear examples of how this could work, we could risk doing the same thing again.

I think this issue should really be expanded to cover "TLS Certs that are not configured directly by the Gateway Owner." That would include both of the following:

1. Automatically generated TLS certs that are added after the fact
In this example, a controller would watch Gateways and HTTPRoutes, generate TLS certs, and attach them to the Gateway. Depending on the implementation details, Gateway owners may need to configure something at the Gateway or Listener level to explicitly opt-in to this feature. For example, let's say someone created acme-cert-generator to generate certs following this pattern. That generator may choose to only generate and populate certs on Gateway Listeners with acme.io/cert-generator set in tls.options or a similar annotation set for the entire Gateway.

Note that this is actually fairly similar to how Cert Manager works today, but that requires Gateway owners to reference a Kubernetes Secret that it will then populate. If we didn't require CertRefs or TLS to be populated, this would not be necessary.

2. Certs that are specified by some other persona, likely an app owner
In this hypothetical example, a new controller and CRD would be created. This CRD would link hostnames to user-provided certs, and then the controller would populate the certs specified by that CRD on Gateway listeners that matched those hostnames. This would also likely benefit from a Listener or Gateway-level opt-in for the behavior.

Both of these share the same characteristics:

  • Gateway owners do not directly configure TLS certs for some or all Listeners
  • Depending on the implementation, Gateway owners may need to opt-in to this functionality at a Gateway or Listener level
  • We want to enable Gateway owners to leave TLS config unspecified until some other source can populate it

If these general patterns make sense, it would be worth documenting them somewhere so we don't lose track of them.

/kind documentation
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2024
@EItanya
Copy link
Contributor

EItanya commented Jan 11, 2024

I think both of these are very interesting use-cases. I personally have more experience with the latter. I do think there are some important questions to be asked in the case of both of these approaches though, specifically when it comes to statuses and how the user will understand the state of the system with the addition of new intermediate states.

Both of these can take advantage of the tls.options field on GatewayTLSConfig. However that comes along with its own issues, namely no uniformity and most likely lack of officially supported statuses.

Automatically generated TLS certs that are added after the fact

This one seems a bit easier because missing the secret would render the listener invalid without invalidating the whole Gateway, so I think the existing statuses would cover this use case.

Certs that are specified by some other persona, likely an app owner

This one is a bit more interesting, and is an option that is available today in gloo-edge. However it's very important to note that it is much easier in that API because hostnames MUST be unique and therefore certs can be specified per host without fear of overlap.

This has been a popular feature for gloo-edge users and we would love to be able to offer it with our new Gateway API option.

I completely agree with @robscott about This would also likely benefit from a Listener or Gateway-level opt-in for the behavior.. I think it will be very important to get that part right or it may stray too far from other aspects of the Gateway resource/semantics.

In the short-term I think this can be accomplished with an option on the Listener: k8s.gateway.io/self-service-cert: <namespace>. Or something functionally like that. (To be clear that is just an example). Then the App Team could create a new resource which could be implementation specific to specify the TLS config. It could even mirror the GatewayTLSConfig for the purpose of simplicity.

@youngnick
Copy link
Contributor

I'm really worried about using tls.options so heavily - because it's a map[string]string, and annotations have shown us that bare string key-value pairs are, sticky (that is, hard to change or remove), easy to make incompatible, and often end up requiring specific parsing.

If we're going to do this, I think we need to mandate some things like (not sure if this can be done in CEL or not, but it should be in the spec regardless):

  • a maximum length for both key and value, which should be restrictive. No base64 certs encoded inline in options!
  • the value must not contain newlines (which should help to stop use cases that involve including encoded certs, embedded JSON, or embedded YAML).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants