Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Secret Discovery Service for v2 APIs #1194

Closed
angelachin opened this issue Jun 30, 2017 · 37 comments
Closed

Secret Discovery Service for v2 APIs #1194

angelachin opened this issue Jun 30, 2017 · 37 comments
Labels
api/v2 enhancement Feature requests. Not bugs or questions. help wanted Needs help!

Comments

@angelachin
Copy link

Hi,

We want to be able to rotate Envoy certs/keys without having to restart the Envoy. We're particularly interested in the client cert/key for connecting to upstream clusters. We recognize the CDS allows us to dynamically change clusters, but that doesn't let us rotate the cert used to connect to the CDS itself.

We see prior discussion about supporting cert rotation (#891) which was answered by pointing people towards the in-development LDS (#315). However, we don't see how the LDS solves the problem of rotating certs used to connect to the CDS.

What work would be required for support cert/key rotation for CDS without having to restart Envoy? This issue is a blocker for us using Envoy (we're looking at integrating it, and perhaps Istio, into Cloud Foundry).

We might be able to contribute the feature, if we had some guidance.

Cheers,
Angela and @rosenhouse

@mattklein123 mattklein123 added the enhancement Feature requests. Not bugs or questions. label Jun 30, 2017
@mattklein123
Copy link
Member

We don't have a good story/plan for this right now. However I recognize that it's an important use case. For now is it possible to hot restart the entire Envoy process until we figure out the right approach/resources for solving this?

@angelachin
Copy link
Author

Thanks. Our process monitoring currently does not support hot restarting the Envoy, but we can probably find a work-around.

@mattklein123
Copy link
Member

@htuch @dnoe @myidpt ^ for xDS TLS bootstrapping issue.

@htuch
Copy link
Member

htuch commented Jun 30, 2017

So, if we add KDS, we could have Envoy's client cert (presumably specified in bootstrap.proto) can either be on disk or via KDS.

@mattklein123
Copy link
Member

But that doesn't implicitly handle rotation. If we want rotation KDS must be polling and able to update the context inline.

@mattklein123
Copy link
Member

(Which I guess is where we were going anyway)

@htuch
Copy link
Member

htuch commented Jun 30, 2017

Yeah, it should support server initiated updates like the other APIs. I'm not if I fully grok the implementation complexity of this - can't we have a map somewhere from cert name to certificate and just have KDS map this essentially, with LDS consuming (for server certs) and also the xDS APIs consuming (for Envoy management server client certs).

@mattklein123
Copy link
Member

Basically the question is where the KDS polling lives and how it actually physically does the update in the code if there is a key material change. For example, there are a few concrete implementation options:

  1. KDS is able to reach into an existing listener or cluster, and modify the ssl_context with new key material (we should not limit KDS to xDS APIs, it should just be part of the generic cluster definition).
  2. Listener/Clusters continue to be effectively immutable, and a KDS update logically regenerates the listener/cluster with new key material and forces an update of either the listener or cluster via LDS/CDS internal API.

I think either could work. Whoever does the actual implementation of this will need to do some design work.

@myidpt
Copy link
Contributor

myidpt commented Jun 30, 2017

@wattli

Just to make it a little clear to me, in terms of triggering runtime cert rotation, I think there are two cases (please correct me if I misspoke):

  1. Only change the file content of the files in ssl_context w/o changing the file paths. This requires recreating Ssl::ContextImpl:
    https://github.com/lyft/envoy/blob/7503ce005f346b04c762506c1becbfddea9a86ce/source/common/ssl/context_impl.cc#L22
    which is created at initiation of ClusterInfoImpl:
    https://github.com/lyft/envoy/blob/252c361a3a3c4b8383cce8c70ce1c2bbd173523d/source/common/upstream/upstream_impl.cc#L84
    I think the only instantiation is at start / when a new CDS update is pulled. But the CDS does not necessarily change. So we need to either 1) dynamically check the files for updates or 2) rely on KDS to recreate Ssl::ContextImpl.
  2. Change the file paths in ssl_context to reflect a rotation. This requires reloading new configuration into memory. It is also triggered by CDS. This could be considered as a workround but not ideal.

Another question is (just thinking aloud), if the certificate is changed (SAN changed) in the middle of a connection, how to re-authenticate? This is like a corner case but I think we may need to cover it some time in the future.

@rosenhouse
Copy link
Contributor

if the certificate is changed (SAN changed) in the middle of a connection, how to re-authenticate

Would it be possible to continue using the old cert until the connection can be drained, and only use the new cert for new connections?

@mattklein123
Copy link
Member

Would it be possible to continue using the old cert until the connection can be drained, and only use the new cert for new connections?

Yes this is how it will work.

@myidpt
Copy link
Contributor

myidpt commented Jun 30, 2017

I doubt that is a security issue. Suppose a client has a certificate with a specific SAN for only 10 min, if he uses that cert to establish a one-day connection, is that a problem?

@andraxylia
Copy link

We could use this feature in Istio (istio.io).

@htuch
Copy link
Member

htuch commented Aug 7, 2017

I feel that #1357 subsumes this. We should probably close this one out as a dupe as a result. #1357 is sparser but tracks a concrete action item.

@mattklein123
Copy link
Member

I've had multiple requests for an additional API to fetch key data since key data might want to come from a different management server. Probably worth keeping this open to track?

@htuch
Copy link
Member

htuch commented Aug 7, 2017

OK, I will retitle to Key Discovery Service then.

@htuch htuch changed the title Cert/Key rotation for CDS without restart Key Discovery Service for v2 APIs Aug 7, 2017
@rosenhouse
Copy link
Contributor

rosenhouse commented Aug 7, 2017

As an Envoy user, will I need to implement an KDS API in order to support key-rotation without restart? Or would there be a way to get new cert & key from the filesystem directly?

@htuch
Copy link
Member

htuch commented Aug 7, 2017

We have concrete plans today to allow dynamic updates via the filesystem if the listeners are specified in a file on the filesystem (e.g. listeners.proto) and the keys/certs are inline with the listener definitions in this file. This works via inotify on Linux. This doesn't work with an LDS server.

When we add KDS, this would probably work the same way, except the restriction this time would be no KDS server.

@mattklein123
Copy link
Member

I have a feeling we may need to add key-rotation directly via filesystem for Lyft use so it will come one way or the other.

@htuch htuch added the help wanted Needs help! label Aug 10, 2017
@rosenhouse
Copy link
Contributor

This issue came up again in the Istio community call today. Would you be open to a contribution of this feature?

@mattklein123
Copy link
Member

Yes. I would like to make progress on this. I've been discussing this off and on with various people. cc @PiotrSikora @myidpt

@PiotrSikora What is currently plan for key delivery on your end? Ultimately I think we want to support dynamic reload via KDS API and also probably via watching FS file.

@rosenhouse
Copy link
Contributor

Yes, the FS watch is specifically what we were interested in.

@mattklein123
Copy link
Member

FS watch is unlikely to be part of KDS API. I think it would be a direct addition to Listener/Cluster TLS context. @PiotrSikora can advise.

@PiotrSikora
Copy link
Contributor

PiotrSikora commented Sep 21, 2017

I've been thinking about KDS (and cert rotation in general) this week. I'll write up a doc and share it either tomorrow or early next week, but it basically boils down to KDS pushing named TlsCertificate (data-plane-api/api/tls_context.proto#L36), and LDS/CDS being able to reference TlsCertificate by "name", which is most likely going to be the SPKI of the certificate by default, perhaps with a special "node-identity" name used by Istio to refer to a node provided identity, that's going to be used not only for communication with xDS (i.e. Pilot), but also for all outgoing connections.

FS watch, while nice in theory, is not only much more limited than KDS, but also prone to race conditions, since updates to the certificate chain and key files won't be done in a single transaction.

cc @myidpt @wattli

@rosenhouse
Copy link
Contributor

@PiotrSikora I see something above (link) about

concrete plans today to allow dynamic updates via the filesystem if the listeners are specified in a file on the filesystem (e.g. listeners.proto) and the keys/certs are inline with the listener definitions in this file.

That should address the race-condition concern, no?

@mattklein123 mattklein123 changed the title Key Discovery Service for v2 APIs Secret Discovery Service for v2 APIs Nov 1, 2017
@mangchiandjjoe
Copy link
Contributor

I started working on the proof of concept implementation.

@vadimeisenbergibm
Copy link
Contributor

vadimeisenbergibm commented Feb 27, 2018

@PiotrSikora, do you have any estimation when the support in envoy of the Secret Discovery Service will be implemented?

@PiotrSikora
Copy link
Contributor

@vadimeisenbergibm @mangchiandjjoe just started working on it.

@kyessenov
Copy link
Contributor

kyessenov commented Feb 28, 2018 via email

@vadimeisenbergibm
Copy link
Contributor

Thanks, @kyessenov, I see.

This was referenced Jun 21, 2018
This was referenced Aug 16, 2018
htuch pushed a commit that referenced this issue Aug 24, 2018
Implement SDS API and dummy socket, and they are not in use. This is split from PR #4176.

Risk Level: Low
Testing: Unit tests
Docs Changes: None

Fixes #1194

Signed-off-by: JimmyCYJ <jimmychen.0102@gmail.com>
lizan pushed a commit that referenced this issue Aug 31, 2018
Description: Use SDS api that fetches secrets from remote SDS server. Secrets are stored in Secret Provider. Listeners and Clusters are updated when secrets are received.
Risk Level: Low
Testing: Unit tests and integration tests
Fixes #1194

Signed-off-by: Jimmy Chen jimmychen.0102@gmail.com
htuch pushed a commit that referenced this issue Sep 17, 2018
 Refactor SdsApi to support dynamic certificate validation context, and support Envoy to fetch certificate validation context from remote server via SDS API.
Risk Level: Low
Testing: Unit tests and integration tests.
Fixes #1194

Signed-off-by: JimmyCYJ <jimmychen.0102@gmail.com>
@taion809
Copy link
Contributor

What is the status of this? I see this specific ticket is closed but there doesn't appear to be any published documentation regarding this feature.

@qiwzhang
Copy link
Contributor

It has been fully implemented.

Documentation is kind of hiding here
https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/auth/cert.proto#envoy-api-msg-auth-sdssecretconfig

But it may be easier to read integration test on how to use it.
https://github.com/envoyproxy/envoy/blob/master/test/integration/sds_dynamic_integration_test.cc

@mattklein123
Copy link
Member

@qiwzhang it was on my list, but I forget before I went on leave, but we should have arch documentation on this feature. I'm going to open a doc issue on that.

@taion809
Copy link
Contributor

@qiwzhang that's awesome thanks.

jpsim pushed a commit that referenced this issue Nov 28, 2022
Description: Custom yaml will be treated as a configuration template, rather than a fixed set of configuration. It's still possible to pass fixed configuration, simply by not including any tokens for interpolation. This usage was already present and appeared supported by the API (e.g. see HttpBridgeTests), but simply wouldn't work, without any indication that it wouldn't.
Risk Level: Low
Testing: Local & CI

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
jpsim pushed a commit that referenced this issue Nov 29, 2022
Description: Custom yaml will be treated as a configuration template, rather than a fixed set of configuration. It's still possible to pass fixed configuration, simply by not including any tokens for interpolation. This usage was already present and appeared supported by the API (e.g. see HttpBridgeTests), but simply wouldn't work, without any indication that it wouldn't.
Risk Level: Low
Testing: Local & CI

Signed-off-by: Mike Schore <mike.schore@gmail.com>
Signed-off-by: JP Simard <jp@jpsim.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/v2 enhancement Feature requests. Not bugs or questions. help wanted Needs help!
Projects
None yet
Development

No branches or pull requests