Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SecretDiscoveryService server implementation #667

Merged
merged 4 commits into from
Jan 23, 2019
Merged

Conversation

azdagron
Copy link
Member

@azdagron azdagron commented Jan 9, 2019

This PR adds an implementation of the Envoy Secret Discovery Service to the SPIRE agent. The SDS server is off by default and must be enabled with the enable_sds config value.

Specifically, the SDS server is hosted on the same UDS socket as the Workload API. It performs workload attestation to determine which secrets to make available to the caller. Resources currently map as follows:

  • SVIDs map to TLSCertificate resources with a resource name matching the SPIFFE ID of the SVID.
  • Bundles map to ValidationContext resources with a resource name matching the trust domain ID of the bundle.

At some point, we may consider symbolic names like "default_tls_certificate" or default_validation_context" that map to certain resources, but that is out of scope for the initial
implementation.

This commit adds an implementation of the Envoy Secret Discovery Service
to the SPIRE agent. The SDS server is off by default and must be enabled
with the `enable_sds` config value.

Specifically, the SDS server is hosted on the same UDS socket as the
Workload API. It performs workload attestation to determine which
secrets to make available to the caller. Resources currently map as
follows:
- SVIDs map to TLSCertificate resources with a resource name matching
the SPIFFE ID of the SVID.
- Bundles map to ValidationContext resources with a resource name
matching the trust domain ID of the bundle.

At some point, we may consider symbolic names like
"default_tls_certificate" or "default_validation_context" that map to
certain resources, but that is out of scope for the initial
implementation.

Signed-off-by: Andrew Harding <azdagron@gmail.com>
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, thanks @azdagron

I reviewed this 1) while sick, and 2) without intimate knowledge of expected xDS API behavior (though I did reference the docs)... so, apologies in advance for any strange questions or comments :). Hopefully I'll be better by the time this is ready for another pass

}
defer done()

metrics = metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics is unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

for {
select {
case newReq := <-reqch:
h.c.Log.Debugf("stream-secrets: recv: res=%q ver=%q typ=%q nonce=%q", newReq.ResourceNames, newReq.VersionInfo, newReq.TypeUrl, newReq.ResponseNonce)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good opportunity to take advantage of the structured logger fields

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!

// match the last sent nonce, otherwise the request should be
// ignored.
if ((lastNonce == "") != (newReq.ResponseNonce == "")) || lastNonce != newReq.ResponseNonce {
h.c.Log.Warnf("unexpected nonce %q (expected %q); ignoring request", newReq.ResponseNonce, lastNonce)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: log message should be capitalized. A little more clarity would be good too e.g. Received unexpected nonce %q (expected %q); ignoring request. The subsystem_name field should be enough to indicate that it's related to the SDS API

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

// The nonce should be empty (if we've never sent a response) or
// match the last sent nonce, otherwise the request should be
// ignored.
if ((lastNonce == "") != (newReq.ResponseNonce == "")) || lastNonce != newReq.ResponseNonce {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this different than if lastNonce != newReq.ResponseNonce {? Seems to me that if we have lastNonce set to none, we should also expect newReq to have no nonce set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems to me you are correct

if req.ErrorDetail != nil {
// The caller has failed to apply the secrets. Wait until the
// next update to send down new info.
h.c.Log.Errorf("caller failed to apply secrets: %q", req.ErrorDetail.Message)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: clearer message and capitalization would be good (we know its x509 etc) e.g. Envoy instance failed to install an SVID: %q

Is it possible to know a SPIFFE ID or something here to identify which instance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the "secret" could be a validation context :) i will add some details though, like the resource names.

return err
}

h.c.Log.Debugf("stream-secrets: send: ver=%q typ=%q nonce=%q count=%d", resp.VersionInfo, resp.TypeUrl, resp.Nonce, len(resp.Resources))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here on logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
if req.ErrorDetail != nil {
// The caller has failed to apply the secrets. Wait until the
// next update to send down new info.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What sort of conditions can cause this to fail? Waiting for the next update might be too little too late...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the update is malformed, or empty for example. There may be transient error conditions, but my interpretation of https://github.com/envoyproxy/data-plane-api/blob/master/XDS_PROTOCOL.md#when-to-send-an-update is that I shouldn't re-send the update unless there is some change from the last DiscoveryResponse that was sent. I assume there is no corrective action I can take on the payload to get envoy to suddenly accept it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... ok yes, I see that now. I misinterpreted one of the error diagrams in that doc.

I guess I figured that there are probably transient error conditions on envoy side, but maybe there are not...? My main concern here was a transient envoy error preventing an SVID update, then subsequently falling offline because presumably that SVID would expire before we try to create a new one (which would really be the third one).

I am definitely no expert here... I guess we can take it as is and see if we run into anything

}

func (h *Handler) FetchSecrets(ctx context.Context, req *api_v2.DiscoveryRequest) (*api_v2.DiscoveryResponse, error) {
h.c.Log.Debugf("fetch-secrets: res=%q ver=%q typ=%q", req.ResourceNames, req.VersionInfo, req.TypeUrl)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here on logging

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}
defer done()

metrics = metrics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics is unused

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

func (h *Handler) callerPID(ctx context.Context) (pid int32, err error) {
info, ok := auth.CallerFromContext(ctx)
if !ok {
return 0, errors.New("Unable to fetch credentials from context")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these errors should be lowercased

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Andrew Harding <azdagron@gmail.com>
Signed-off-by: Andrew Harding <azdagron@gmail.com>
Copy link
Member

@evan2645 evan2645 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wooooo 🔵 🐙

errch <- err
return
}
reqch <- req
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xDS docs say that the management server should respond to the latest request received by envoy, though it seems like there is room for a second request to race the first response. It's not super clear to me how we can avoid that gracefully, but thought I'd point it out... maybe I am missing something

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is possible to avoid the race no matter what you do, at least the way the protocol is constructed 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea I thought the same... 🤷

@azdagron azdagron merged commit aeec64a into spiffe:master Jan 23, 2019
@azdagron azdagron deleted the sds branch January 23, 2019 02:21
@evan2645 evan2645 mentioned this pull request Feb 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants