Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSDS: add new config status ACKED #13121

Merged
merged 10 commits into from
Sep 18, 2020
Merged

Conversation

lidizheng
Copy link
Contributor

@lidizheng lidizheng commented Sep 15, 2020

Signed-off-by: Lidi Zheng lidiz@google.com

Signed-off-by: Lidi Zheng lidiz@google.com

Commit Message: Add new config status for CSDS: ACKED.
Additional Description: As the CSDS service definition described, it has the potential to be used to expose xDS config from a client or proxy. gRPC wants to utilize this service to improve its debuggability. But the ConfigStatus is designed from the control plane point of view. Especially, the client cannot predict if there is new config on its way, so it can't accurately claim any xDS config status as SYNCED. We need another config status to indicate the status that the client received the status and sent out ACK. The actual wording is up to debate.
Risk Level: Low
Testing: Not needed
Docs Changes: Inline with proto definition
Release Notes: New config status for CSDS: ACKED.

CC @fuqianggao @markdroth @menghanl

Signed-off-by: Lidi Zheng <lidiz@google.com>

Signed-off-by: Lidi Zheng <lidiz@google.com>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy[\w/]*/(v1alpha\d?|v1|v2alpha\d?|v2))|(api/envoy/type/(matcher/)?\w+.proto).
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #13121 was opened by lidizheng.

see: more, trace.

@htuch htuch self-assigned this Sep 16, 2020
ERROR = 4;

// The client has received the config and repied with ACK.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm almost thinking we want separate status enum messages or very explicit field naming to help make clear which one (client vs. server status) we're talking about.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seeing these comments, I guess I do think that having a separate set of enums would be clearer. We can call them something like CLIENT_REQUESTED, CLIENT_ACKED, and CLIENT_NACKED.

We can either add these as a new values in this existing enum, or we can add a whole separate enum field for use in clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new proto enum struct named ClientSideStatusConfig. It's a bit verbose, but I think it needs to have a very clear naming.

Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this, Lidi!

ERROR = 4;

// The client has received the config and repied with ACK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, seeing these comments, I guess I do think that having a separate set of enums would be clearer. We can call them something like CLIENT_REQUESTED, CLIENT_ACKED, and CLIENT_NACKED.

We can either add these as a new values in this existing enum, or we can add a whole separate enum field for use in clients.

@htuch
Copy link
Member

htuch commented Sep 16, 2020

Let's split this out as a separate enum. I feel it's going to be necessary for any CSDS client to have some logic to make sense of whether they are interpreting a management or client status here.

Signed-off-by: Lidi Zheng <lidiz@google.com>
Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth PTALA.

ERROR = 4;

// The client has received the config and repied with ACK.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a new proto enum struct named ClientSideStatusConfig. It's a bit verbose, but I think it needs to have a very clear naming.

// Client received the config and replied with ACK.
CLIENT_ACKED = 2;

// Client received the config and replied with NACK.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's explicitly document that the resource content that is returned in this case will be the most recent version accepted, if any, not the version that was NACKed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented the expected config dump behavior here and in the PerXdsConfig message.

@@ -73,6 +89,7 @@ message PerXdsConfig {
"envoy.service.status.v2.PerXdsConfig";

ConfigStatus status = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably worth adding a comment here that status will be populated by xDS servers but client_side_config_status will be populated by xDS clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's too bad that we can't use a oneof here, but that would be a breaking change for the go proto API.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding (udpa.annotations.field_migrate).oneof_promotion annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotation added, TIL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@htuch Can you advice what should we use in v4alpha? The format suggests that the migration annotation is expected to be expanded into real oneof in v4alpha. But that will break backward compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine -- v4alpha will be a new major version (if it ever actually happens), so breaking changes are okay. This is actually the whole point of the annotation, to make sure this change happens when the next major version bump occurs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, but it looks like this didn't actually work for some reason. The annotation was carried forward to v4alpha instead of being turned into a oneof. Maybe the annotation has the wrong syntax or something?

Signed-off-by: Lidi Zheng <lidiz@google.com>
Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth PTALAA.

// Client received the config and replied with ACK.
CLIENT_ACKED = 2;

// Client received the config and replied with NACK.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documented the expected config dump behavior here and in the PerXdsConfig message.

@@ -73,6 +89,7 @@ message PerXdsConfig {
"envoy.service.status.v2.PerXdsConfig";

ConfigStatus status = 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added.

Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also change the comment on line 23-26 to indicate that we are now going to use this on the xDS client side.

Thanks!

@@ -53,6 +53,24 @@ enum ConfigStatus {
ERROR = 4;
}

// Config status from a client-side view.
enum ClientSideConfigStatus {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest calling this ClientConfigStatus.

Copy link
Contributor Author

@lidizheng lidizheng Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ClientConfigStatus.

In this proto definition, there is the ClientConfig message. It might be a bit confusing to readers that why ClientConfigStatus is not the status for the ClientConfig message, but the client-side view of config status. On the other hand, the ClientSideConfigStatus is not much better.

Better naming is still welcomed. Picking ClientConfigStatus since it is simpler.

@@ -73,6 +89,7 @@ message PerXdsConfig {
"envoy.service.status.v2.PerXdsConfig";

ConfigStatus status = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It's too bad that we can't use a oneof here, but that would be a breaking change for the go proto API.)

Signed-off-by: Lidi Zheng <lidiz@google.com>
Copy link
Contributor Author

@lidizheng lidizheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markdroth Thanks for the quick review! PTALAAA.

@@ -53,6 +53,24 @@ enum ConfigStatus {
ERROR = 4;
}

// Config status from a client-side view.
enum ClientSideConfigStatus {
Copy link
Contributor Author

@lidizheng lidizheng Sep 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ClientConfigStatus.

In this proto definition, there is the ClientConfig message. It might be a bit confusing to readers that why ClientConfigStatus is not the status for the ClientConfig message, but the client-side view of config status. On the other hand, the ClientSideConfigStatus is not much better.

Better naming is still welcomed. Picking ClientConfigStatus since it is simpler.

@markdroth
Copy link
Contributor

Please also change the comment on line 23-26 to indicate that we are now going to use this on the xDS client side. Thanks!

Signed-off-by: Lidi Zheng <lidiz@google.com>
markdroth
markdroth previously approved these changes Sep 16, 2020
Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

I'll recuse myself and let @htuch or one of the other API shepherds give API approval.

@markdroth
Copy link
Contributor

(Whoops -- I didn't realize that my approving the PR would satisfy the API approval requirement; I thought I had to say "/lgtm api". But since one of the maintainers has to review before merging anyway, that's probably okay.)

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a fix_format.

@@ -73,6 +89,7 @@ message PerXdsConfig {
"envoy.service.status.v2.PerXdsConfig";

ConfigStatus status = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding (udpa.annotations.field_migrate).oneof_promotion annotations.


// Client received the config and replied with NACK. Notably, the attached
// config dump is not the NACKed version, but the most recent accepted one. If
// no config is accepted yet, the attached config dump will be empty.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a similar statement added about control plane behavior for configs with a negative status, i.e. "this is always the latest successful config?" Or maybe that's not even desired, we might want the latest successful and unsuccessful config? Would that also make sense for client? CC @fuqianggao

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior wasn't clearly described in this proto definition before, or in other documentation. Could it be a control-plane specific detail?

Hi @fuqianggao, can you help to explain the designed behavior of config dump when client replied with NACK?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For control plane, it will be the latest config that control plane has. If status is NACK, it means the config is sent to client, but rejected.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think the actionable thing is for @lidizheng to update the management server wording to reflect this. I'm still wondering in this discussion thread, is there value in having both active and failed configs reported?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that for both xDS servers and xDS clients, there would be additional overhead in storing old versions of configs. In general, both sides will store only the latest version that they care about, so they don't have anything else to report here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if all clients implement CSDS, the CLI can get the active config from the clients, and the most up-to-date config from the management server (if they are different).

Copy link
Contributor Author

@lidizheng lidizheng Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the comment for ConfigStatus.ERROR to note that the management server will dump the latest config in case of client replying NACK.


The behavior described by @menghanl is implemented by istioctl's proxy-status command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of being able to query both client and server CSDS and effectively determine the failing and active config. @lidizheng can you please file an issue for Envoy to add support for CSDS? You don't have to do the implementation, but we should track this as an open feature request that we can share with the community.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as an issue: #13181. For Envoy, it overlaps with the admin config dump feature. Today, tools like istioctl is fetching xDS config from server via CSDS, and from client via admin config dump (HTTP).

Signed-off-by: Lidi Zheng <lidiz@google.com>
Signed-off-by: Lidi Zheng <lidiz@google.com>

// Client config status is populated by xDS clients. Will not be present if
// the CSDS server is an xDS server. No matter what the client config status
// is, xDS clients should always dump the most recent accepted xDS config.
ClientConfigStatus client_status = 7;
ClientConfigStatus client_status = 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might need to have a oneof existing. I suggest moving this field lexically before the existing status and wrapping it in a oneof status_config. That should allow the v4alpha generation to work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lidi, I think you still need to address this comment.

Copy link
Contributor Author

@lidizheng lidizheng Sep 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the style in the code base, the oneof_promotion annotation seems appear in pairs (example 1, example 2). Splitting the oneof options into different layers might break the consistency, and slightly weaken the semantic intention of the upcoming migration.

So, in my mind, keeping oneof_promotion in v3 and real oneof in v4alpha is the better way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what I meant.

The v4alpha file looks right now. The last time I looked, I thought it was still wrong, but maybe I was looking at an old snapshot.

Anyway, this looks great. Thanks!

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo the thread on clarifying management server behavior on stale/nacked config. I'd like there to be consistency, and ideally we should have a plan for how to support reporting both last known good and bad config (at least on management server side), since both are useful, CC @fuqianggao

Signed-off-by: Lidi Zheng <lidiz@google.com>
Signed-off-by: Lidi Zheng <lidiz@google.com>
@lidizheng
Copy link
Contributor Author

@htuch PTALAA. All tests are passing, and the mentioned behavior is documented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants