-
Notifications
You must be signed in to change notification settings - Fork 339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(kuma-cp) permissive mtls proposal #2550
Conversation
Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
Inbound listener should be configured with [TLS Inspector](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector) | ||
to detect type of the traffic (`tls` or `plaintext`). Using [FilterChainMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto#config-listener-v3-filterchainmatch) | ||
we can distinguish traffic by `transport_protocol`, it could be either `raw_buffer` or `tls`. If the type is `tls` then FilterChain stays the same, | ||
if the type is `raw_buffer` then filter chain is the same but without `ServerSideMTLS` section. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good!
One thing though:
How can I know as a user whether I can switch to strict mode? Will we be able to provide statistics of how many requests were done with TLS and how many did not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I think we can create a new dashboard based on:
envoy_tls_inspector_tls_found{} 3
envoy_tls_inspector_tls_not_found{} 6
Codecov Report
@@ Coverage Diff @@
## master #2550 +/- ##
==========================================
+ Coverage 52.68% 52.73% +0.04%
==========================================
Files 867 867
Lines 48270 48270
==========================================
+ Hits 25433 25456 +23
+ Misses 20783 20758 -25
- Partials 2054 2056 +2
Continue to review full report at Codecov.
|
backends: | ||
- name: ca-1 | ||
type: builtin | ||
mode: strict|permissive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we instead just flag it as permissive: true
? As the default would already be correct and we'd keep mode
for the day it makes sense to have a mtls mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, I don't know what's better, to be honest, @jakubdyszkiewicz @jpeach do you have any preferences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we going to be supporting more than one mode? In the future do we foresee having something like zone-permissive
where only select zones are permissive (but not all of them)?
If we plant to expand to other modes, then we need a more flexible way of configuring those modes. If not, permissive: true
may work.
IMHO, we should optimize for flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for flexibility. I'd rather have an enum even if we see only two modes right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I understand the motivation of flexibility. Wouldn't any kind of extra permissiveness actually require more information so that an enum by itself wouldn't be sufficient?
In the example of a zone-permissive
you'd probably want to provide a list of zone which are permissive
or not. So in the end it's permissive: true
and some extra config.
Another option you might want to add is permissive-mesh
where you'd want to specify meshes that are allowed to talk to us which would also require extra config.
In any case I'm just suggesting and if you think an enum is better I'm good with this too.
it is not practical to have mTLS enabled for everything or nothing, but we need to provide a more gradual way to do it. | ||
Therefore, a "permissive" mTLS mode (as opposed to the "strict" one we use today) would allow a team to enable mTLS yet allow | ||
traffic that is not part of the zero-trust infrastructure to still be able to make requests without a strict validation of the | ||
certificates on incoming requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a more concrete definition of what "permissive" means? Normally in TLS conversations, it means you have a TLS session that opts out of certain certificate verifications. In a mTLS context, I am expecting it to mean that TLS is being used but we either the server of the client is not being verified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permissive mTLS is a mode applied globally for mesh. Outbounds encrypt traffic the same way it happens in strict mode, but inbounds accept both TLS and plaintext traffic. This allows applications residing in the mesh to accept requests from outside of the mesh.
@jpeach does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh that's great.
|
||
## Configuration | ||
|
||
Add a new field `mode` under the `mtls.backends` section. The type of the field is enum with 2 values: `strict` and `permissive`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can an operator do a progressive rollout with a global toggle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As been discussed we can provide a Grafana dashboard to show how many requests are happening over TLS or plaintext. So as an operator you switch the global toggle to permissive
mode, and gradually add applications to the mesh. When you see all traffic is TLS, then mode could be switched to strict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this model, the rollout is all or nothing, but there's metrics that help you figure out whether it is safe. Is there a use case for progressive rollout, where I would enable enforcement on an increasingly large subset of services, until I eventually reach the whole mesh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not sure I completely understand your idea. Do you mean having some selector
field that'd allow to pick up a subset of proxies/services with strict
mtls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had this idea at some point
type: TrafficMTLS
mesh: default
name: backend
selector:
kuma.io/service: backend
spec:
backend: ca-1
mode: permissive
but we figured that it's kind of useless to have two different backends for some DPs in the mesh or that some DPs have mTLS and some do not. With permissive mode, it may make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, that's the sort of thing I was wondering about. Use cases could be
- you have one important service that should be forces to mTLS even though other things in the cluster are not ready
- everything is ready for mTLS except one service and you want to let it straggle
|
||
Inbound listener should be configured with [TLS Inspector](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector) | ||
to detect type of the traffic (`tls` or `plaintext`). Using [FilterChainMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto#config-listener-v3-filterchainmatch) | ||
we can distinguish traffic by `transport_protocol`, it could be either `raw_buffer` or `tls`. If the type is `tls` then FilterChain stays the same, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this work if the application is doing it's own TLS? Is that something we need to worry about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good point. We need to have a story about how to support such migration from TLS in apps to mTLS in the Mesh.
@lobkovilya should this be merged? |
Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
Summary
Proposal for permissive mTLS mode
Full changelog
permissive-mtls.md
Issues resolved
N/A
Documentation
N/A
Testing
Backwards compatibility
backport-to-stable
label if the code is backwards compatible. Otherwise, list breaking changes.