Document Self Service TLS Certificate Approaches #763

robscott · 2021-08-11T02:39:13Z

What would you like to be added:
As a follow up to the discussion in #749 (comment), it would be good to clarify exactly how the API can be used to allow app owners to provide their own TLS certs.

Why this is needed:
This is a common request, and the existing guidance in GEP #746 is pretty broad and may not scale as much as needed.

robscott · 2021-08-26T16:19:20Z

Following up from a thread with @tsaarni and @youngnick here.

This is something that we discussed in more detail in community meetings as well as #749 (comment). Although we are interested in providing a self service approach here, we believed that the method provided in v1alpha1 was not the right one, and introduced far too many potential conflicts and points of confusion.

What we have added in v1alpha2 is the ability for Gateways to reference TLS certs in different namespaces. This is important as it does allow for an app developer to provide a cert, just with some level of coordination with the Gateway admin. I think the best self-service story involves taking this one step further and building some kind of controller/automation around attaching certs from different namespaces to a Gateway. After all, that's all that the cert ref on HTTPRoute did - it was a shortcut for attaching a cert ref to a listener. That shortcut was likely not the best one, but a controller could likely provide a better experience here as far as automating this attachment.

tsaarni · 2021-09-08T05:45:06Z

For ease of access for the above mentioned thread between myself and @youngnick , I'm copying it from that linked slow-loading-PR to here as well

tsaarni wrote:
I asked following on slack but wanted to add here too: Moving TLS certificate configuration from HTTPRoute to Gateway seems to disallow the self-service use case, where application developer manages the TLS certificates. This use case that existed in v1alpha1, and exists also in Ingress. It would imply that defining a secured virtualhost becomes a split responsibility between application developer and cluster operator.

Over at slack, we discussed that there could be a controller that automatically constructs listeners with TLS secret references from application developer, but I guess the self-service use case would not be part of the Gateway API then anymore?

youngnick wrote:
At the end of the day, including TLS details in HTTPRoute was always just a shortcut way of bringing the TLS config to the Listener anyway, and it introduced a lot of complexity into every implementation of the API. We discussed this on the community meeting and agreed that it's better to separate the complexity of ensuring that the right secrets are bound to the right hostnames on the right ports into a separate bundle of code (the controller you mentioned).

This means that yes, there will not be full support for the self-service use case in the base API. But we really think that doing it with a separate controller allows for:

keeping complexity separate and only for people who need it

the security concerns about dynamically mapping certificates to a host can be managed in that controller as well (and there are quite a few).
Does this mean that it is not and will never be part of the API? No. If there's enough demand for this, could the dynamic controller be developed here? Yes. Do I think that's likely? No.

My personal opinion (not that of the project) is that the self-service use case is too magic, and a problem waiting to happen, just like it is in Ingress.

tsaarni wrote:
Thanks youngnick, I can understand it is cleaner like this and it matches the reality of implementations. But operational-wise, I would have expected the self-service use case to be a common one for multi-tenant clusters and imagined that involving cluster-admin (to secure virtualhost provisioning) could be a problem. I'd be curious to know how the certificate provisioning would be handled in e.g. OpenShift if it were to adopt Gateway API for multi-tenant scenario, maybe danehans has insights?

k8s-triage-robot · 2021-12-07T06:35:54Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

hbagdi · 2021-12-08T19:07:57Z

/lifecycle frozen

shaneutt · 2022-08-05T18:36:10Z

Despite this issue being quite old, we the maintainers are still pretty convinced that we want to have this functionality in a future release. We are marking this help wanted as we're looking for contributors with strong use cases to help champion and drive this forward.

mikemorris · 2022-10-04T19:17:20Z

#1430 may provide a path forward for implementing this functionality.

k8s-triage-robot · 2023-07-04T23:23:34Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

robscott · 2024-01-11T20:15:19Z

This topic has come up again more recently, and I think we need to revisit it. As we've shown in #2668, the API validation actually accidentally to achieve this goal. We're working to revert that strict validation, but without clear examples of how this could work, we could risk doing the same thing again.

I think this issue should really be expanded to cover "TLS Certs that are not configured directly by the Gateway Owner." That would include both of the following:

1. Automatically generated TLS certs that are added after the fact
In this example, a controller would watch Gateways and HTTPRoutes, generate TLS certs, and attach them to the Gateway. Depending on the implementation details, Gateway owners may need to configure something at the Gateway or Listener level to explicitly opt-in to this feature. For example, let's say someone created acme-cert-generator to generate certs following this pattern. That generator may choose to only generate and populate certs on Gateway Listeners with acme.io/cert-generator set in tls.options or a similar annotation set for the entire Gateway.

Note that this is actually fairly similar to how Cert Manager works today, but that requires Gateway owners to reference a Kubernetes Secret that it will then populate. If we didn't require CertRefs or TLS to be populated, this would not be necessary.

2. Certs that are specified by some other persona, likely an app owner
In this hypothetical example, a new controller and CRD would be created. This CRD would link hostnames to user-provided certs, and then the controller would populate the certs specified by that CRD on Gateway listeners that matched those hostnames. This would also likely benefit from a Listener or Gateway-level opt-in for the behavior.

Both of these share the same characteristics:

Gateway owners do not directly configure TLS certs for some or all Listeners
Depending on the implementation, Gateway owners may need to opt-in to this functionality at a Gateway or Listener level
We want to enable Gateway owners to leave TLS config unspecified until some other source can populate it

If these general patterns make sense, it would be worth documenting them somewhere so we don't lose track of them.

/kind documentation
/remove-lifecycle stale

EItanya · 2024-01-11T21:45:17Z

I think both of these are very interesting use-cases. I personally have more experience with the latter. I do think there are some important questions to be asked in the case of both of these approaches though, specifically when it comes to statuses and how the user will understand the state of the system with the addition of new intermediate states.

Both of these can take advantage of the tls.options field on GatewayTLSConfig. However that comes along with its own issues, namely no uniformity and most likely lack of officially supported statuses.

Automatically generated TLS certs that are added after the fact

This one seems a bit easier because missing the secret would render the listener invalid without invalidating the whole Gateway, so I think the existing statuses would cover this use case.

Certs that are specified by some other persona, likely an app owner

This one is a bit more interesting, and is an option that is available today in gloo-edge. However it's very important to note that it is much easier in that API because hostnames MUST be unique and therefore certs can be specified per host without fear of overlap.

This has been a popular feature for gloo-edge users and we would love to be able to offer it with our new Gateway API option.

I completely agree with @robscott about This would also likely benefit from a Listener or Gateway-level opt-in for the behavior.. I think it will be very important to get that part right or it may stray too far from other aspects of the Gateway resource/semantics.

In the short-term I think this can be accomplished with an option on the Listener: k8s.gateway.io/self-service-cert: <namespace>. Or something functionally like that. (To be clear that is just an example). Then the App Team could create a new resource which could be implementation specific to specify the TLS config. It could even mirror the GatewayTLSConfig for the purpose of simplicity.

youngnick · 2024-01-15T05:59:19Z

I'm really worried about using tls.options so heavily - because it's a map[string]string, and annotations have shown us that bare string key-value pairs are, sticky (that is, hard to change or remove), easy to make incompatible, and often end up requiring specific parsing.

If we're going to do this, I think we need to mandate some things like (not sure if this can be done in CEL or not, but it should be in the spec regardless):

a maximum length for both key and value, which should be restrictive. No base64 certs encoded inline in options!
the value must not contain newlines (which should help to stop use cases that involve including encoded certs, embedded JSON, or embedded YAML).

robscott added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 11, 2021

robscott mentioned this issue Aug 11, 2021

Adding GEP 746: Replace Cert Refs on HTTPRoute with Cross Namespace Refs from Gateway #749

Merged

robscott mentioned this issue Aug 26, 2021

SIG-NETWORK v1alpha2 Review #780

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 7, 2021

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 8, 2021

shaneutt added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 5, 2022

mikemorris mentioned this issue Oct 4, 2022

Allow application developer to specify TLS client certificate #622

Closed

tsaarni mentioned this issue Oct 13, 2022

GEP-1282 Backend Properties - Update implementation #1430

Closed

shaneutt added this to Gateway API: The Road to GA Mar 8, 2023

shaneutt added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Mar 8, 2023

shaneutt moved this to Triage in Gateway API: The Road to GA Mar 8, 2023

shaneutt modified the milestone: v1.0.0 Mar 8, 2023

shaneutt added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed triage/needs-information Indicates an issue needs more information in order to work on it. labels Apr 5, 2023

shaneutt removed this from Gateway API: The Road to GA Apr 5, 2023

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 4, 2023

k8s-ci-robot added kind/documentation Categorizes issue or PR as related to documentation. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 11, 2024

robscott mentioned this issue Jan 11, 2024

TLS Config Should be Optional #2713

Closed

robscott mentioned this issue Jan 18, 2024

Loosening TLS validation to enable indirect TLS config #2721

Merged

k8s-ci-robot closed this as completed in #2721 Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document Self Service TLS Certificate Approaches #763

Document Self Service TLS Certificate Approaches #763

robscott commented Aug 11, 2021

robscott commented Aug 26, 2021

tsaarni commented Sep 8, 2021

k8s-triage-robot commented Dec 7, 2021

hbagdi commented Dec 8, 2021

shaneutt commented Aug 5, 2022

mikemorris commented Oct 4, 2022

k8s-triage-robot commented Jul 4, 2023

robscott commented Jan 11, 2024

EItanya commented Jan 11, 2024

youngnick commented Jan 15, 2024

Document Self Service TLS Certificate Approaches #763

Document Self Service TLS Certificate Approaches #763

Comments

robscott commented Aug 11, 2021

robscott commented Aug 26, 2021

tsaarni commented Sep 8, 2021

k8s-triage-robot commented Dec 7, 2021

hbagdi commented Dec 8, 2021

shaneutt commented Aug 5, 2022

mikemorris commented Oct 4, 2022

k8s-triage-robot commented Jul 4, 2023

robscott commented Jan 11, 2024

EItanya commented Jan 11, 2024

youngnick commented Jan 15, 2024