Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(kuma-cp) permissive mtls proposal #2550

Merged
merged 1 commit into from
Sep 22, 2021

Conversation

lobkovilya
Copy link
Contributor

Summary

Proposal for permissive mTLS mode

Full changelog

  • add permissive-mtls.md

Issues resolved

N/A

Documentation

N/A

Testing

  • Unit tests
  • E2E tests
  • Manual testing on Universal
  • Manual testing on Kubernetes

Backwards compatibility

  • Add backport-to-stable label if the code is backwards compatible. Otherwise, list breaking changes.

Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
@lobkovilya lobkovilya requested a review from a team as a code owner August 11, 2021 14:17
Inbound listener should be configured with [TLS Inspector](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector)
to detect type of the traffic (`tls` or `plaintext`). Using [FilterChainMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto#config-listener-v3-filterchainmatch)
we can distinguish traffic by `transport_protocol`, it could be either `raw_buffer` or `tls`. If the type is `tls` then FilterChain stays the same,
if the type is `raw_buffer` then filter chain is the same but without `ServerSideMTLS` section.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good!

One thing though:
How can I know as a user whether I can switch to strict mode? Will we be able to provide statistics of how many requests were done with TLS and how many did not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I think we can create a new dashboard based on:

envoy_tls_inspector_tls_found{} 3
envoy_tls_inspector_tls_not_found{} 6

@codecov-commenter
Copy link

Codecov Report

Merging #2550 (985ce79) into master (fbd0831) will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2550      +/-   ##
==========================================
+ Coverage   52.68%   52.73%   +0.04%     
==========================================
  Files         867      867              
  Lines       48270    48270              
==========================================
+ Hits        25433    25456      +23     
+ Misses      20783    20758      -25     
- Partials     2054     2056       +2     
Impacted Files Coverage Δ
pkg/core/resources/manager/cache.go 84.41% <0.00%> (+2.59%) ⬆️
pkg/plugins/resources/memory/store.go 82.06% <0.00%> (+4.34%) ⬆️
pkg/kds/client/sink.go 52.27% <0.00%> (+4.54%) ⬆️
pkg/core/bootstrap/autoconfig.go 54.46% <0.00%> (+8.03%) ⬆️
pkg/core/resources/store/customizable_store.go 88.88% <0.00%> (+11.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fbd0831...985ce79. Read the comment docs.

backends:
- name: ca-1
type: builtin
mode: strict|permissive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we instead just flag it as permissive: true? As the default would already be correct and we'd keep mode for the day it makes sense to have a mtls mode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I don't know what's better, to be honest, @jakubdyszkiewicz @jpeach do you have any preferences?

Copy link
Contributor

@subnetmarco subnetmarco Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to be supporting more than one mode? In the future do we foresee having something like zone-permissive where only select zones are permissive (but not all of them)?

If we plant to expand to other modes, then we need a more flexible way of configuring those modes. If not, permissive: true may work.

IMHO, we should optimize for flexibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for flexibility. I'd rather have an enum even if we see only two modes right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I understand the motivation of flexibility. Wouldn't any kind of extra permissiveness actually require more information so that an enum by itself wouldn't be sufficient?
In the example of a zone-permissive you'd probably want to provide a list of zone which are permissive or not. So in the end it's permissive: true and some extra config.
Another option you might want to add is permissive-mesh where you'd want to specify meshes that are allowed to talk to us which would also require extra config.

In any case I'm just suggesting and if you think an enum is better I'm good with this too.

it is not practical to have mTLS enabled for everything or nothing, but we need to provide a more gradual way to do it.
Therefore, a "permissive" mTLS mode (as opposed to the "strict" one we use today) would allow a team to enable mTLS yet allow
traffic that is not part of the zero-trust infrastructure to still be able to make requests without a strict validation of the
certificates on incoming requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get a more concrete definition of what "permissive" means? Normally in TLS conversations, it means you have a TLS session that opts out of certain certificate verifications. In a mTLS context, I am expecting it to mean that TLS is being used but we either the server of the client is not being verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permissive mTLS is a mode applied globally for mesh. Outbounds encrypt traffic the same way it happens in strict mode, but inbounds accept both TLS and plaintext traffic. This allows applications residing in the mesh to accept requests from outside of the mesh.

@jpeach does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeh that's great.


## Configuration

Add a new field `mode` under the `mtls.backends` section. The type of the field is enum with 2 values: `strict` and `permissive`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can an operator do a progressive rollout with a global toggle?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As been discussed we can provide a Grafana dashboard to show how many requests are happening over TLS or plaintext. So as an operator you switch the global toggle to permissive mode, and gradually add applications to the mesh. When you see all traffic is TLS, then mode could be switched to strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this model, the rollout is all or nothing, but there's metrics that help you figure out whether it is safe. Is there a use case for progressive rollout, where I would enable enforcement on an increasingly large subset of services, until I eventually reach the whole mesh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm not sure I completely understand your idea. Do you mean having some selector field that'd allow to pick up a subset of proxies/services with strict mtls?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we had this idea at some point

type: TrafficMTLS
mesh: default
name: backend
selector:
  kuma.io/service: backend
spec:
  backend: ca-1
  mode: permissive

but we figured that it's kind of useless to have two different backends for some DPs in the mesh or that some DPs have mTLS and some do not. With permissive mode, it may make sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeh, that's the sort of thing I was wondering about. Use cases could be

  • you have one important service that should be forces to mTLS even though other things in the cluster are not ready
  • everything is ready for mTLS except one service and you want to let it straggle


Inbound listener should be configured with [TLS Inspector](https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/tls_inspector)
to detect type of the traffic (`tls` or `plaintext`). Using [FilterChainMatch](https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/listener/v3/listener_components.proto#config-listener-v3-filterchainmatch)
we can distinguish traffic by `transport_protocol`, it could be either `raw_buffer` or `tls`. If the type is `tls` then FilterChain stays the same,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work if the application is doing it's own TLS? Is that something we need to worry about here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. We need to have a story about how to support such migration from TLS in apps to mTLS in the Mesh.

@lahabana
Copy link
Contributor

lahabana commented Sep 2, 2021

@lobkovilya should this be merged?

@lobkovilya lobkovilya merged commit 1333e7c into master Sep 22, 2021
@lobkovilya lobkovilya deleted the docs/permissive-mtls-proposal branch September 22, 2021 08:56
nikita15p pushed a commit to nikita15p/kuma that referenced this pull request Sep 28, 2021
Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
nikita15p pushed a commit to nikita15p/kuma that referenced this pull request Sep 28, 2021
Signed-off-by: Ilya Lobkov <lobkovilya@yandex.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants