Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A36: xDS-Enabled Servers #214

Merged
merged 25 commits into from
Feb 26, 2021
Merged

A36: xDS-Enabled Servers #214

merged 25 commits into from
Feb 26, 2021

Conversation

ejona86
Copy link
Member

@ejona86 ejona86 commented Nov 20, 2020

See "open issues" at the end of the doc. At the time of creation, Go and C are much more tentative than Java.

CC @sanjaypujare, @dfawley, @markdroth

@ejona86
Copy link
Member Author

ejona86 commented Nov 20, 2020

CC @easwars


The xDS-returned Listener must have an [`address`][Listener.address] that
matches the listening address provided. The Listener's `address` would be a TCP
`SocketAddress` with matching `address` and `port_value`. The XdsClient must NAK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/NAK/NACK maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Two spellings for the same thing.

`example/resource?udpa.resource.listening_address=[::]:80`. Note that the `[]`
is _not_ percent encoded, as this string is not a URI.

The xDS-returned Listener must have an [`address`][Listener.address] that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Go, the piece of code which processes the response from the management server and decides whether to ack or nack, does not know about the set of resources being watched (that happens at a higher layer). This means that when we get a Listener resource, we do not have a way to make sure that the address and port_value in the SocketAddress match the one specified in the listening_address field. What we can do is to make sure that it matches with the name field (because eventually the name in the response has to match the one in the request).

I wanted to bring this us up to see if there will be any inconsistency in the handling of this across the languages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no listening_address field. It is just encoded in the resource name in the request. So when you match the name in the response to the name in the request you are indirectly verifying the value of the listening_address. In that case it should not be difficult to match that part with the [address][Listener.address] value either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating my earlier comment from reviewable:

Hmm, good question. Maybe the XdsClient can verify that the Listener resource has its address field set to the same address as in the resource name. Although that might be difficult if we use the template approach I suggest above instead of hard-coding "xds.resource.listener_address=" in our implementation.

We have now switched to the template approach. Given that, I don't think we can make any assumptions about the name field.

I would like to do this validation, but I don't see a reasonable way to do it without some fairly ugly layering violations. Unless one of you can think of another way to do this, I think we may need to live without this check. :(

Copy link
Contributor

@sanjaypujare sanjaypujare Jan 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can think of the following layered approach for validating this:

  • the xDSClient validates the returned Listener by matching the resource name against the requested resource name (trivial common check for all resources?)
  • the actual watcher that requested this Listener resource then validates that the returned Listener has the address (Listener.address) matching the expected value. This might be too late to NACK to the control plane but should also be rare (a bug in the control plane).


### Go

Create an `xds.Server` struct that would internally contain a `grpc.Server`. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create an `xds.Server` struct that would internally contain a `grpc.Server`. It
would inject its own `StreamServerInterceptor` and `UnaryServerInterceptor`s
into the `grpc.Server`. The interceptors would be controlled by an
`atomic.Value` that would update with the configuration. When
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we skip this line about using an atomic to update the interceptors? It seems very implementation specific and could change anytime. Currently, we do have an interceptor which does nothing, and is not changed based on any configuration received. When we start supporting changing of the interceptors based on received configuration, we could use an atomic or a mutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in the option for using a mutex. Calling out how the interceptor will roughly work is important to show the design actually works.

TODO: It looks like there is a lifetime issue for the XdsClient, as RPCs can
continue after the listener is closed and Serve() returns.

Because service registration is done on a method in the generated code, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer true. We have added a ServiceRegistrar interface to the grpc package which contains a single method RegisterService, and this is implemented by both grpc.Server and xds.GRPCServer. The protoc-gen-go has also been updated to emit code where the registration function accepts this interface instead of a concrete type. So, users should be able to directly pass xds.GRPCServer to these registration functions, and the underlying grpc.Server is no longer exported.

https://github.com/grpc/grpc-go/blob/8736dcd05b735913226c0c8bf787db48a13a7129/server.go#L558

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the code snippet below change for xDS? Was a new method introduced to the generated code?

s := xds.NewGRPCServer()
pb.RegisterGreeterServer(s, &server{})

```


To allow passing configuration, the `xds.Server` will be responsible for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section needs updating as well. The current plan can be summarized as follows:

  • An API to create a server-side XdsCredentials will be exposed. This will be similar to the NewClientCredentials, and would be called NewServerCredentials.
  • The xds.GRPCServer will look through the ServerOptions list passed to it and will use a type assertion to determine whether or not the user has specified XdsCredentials.
  • If the user has specified XdsCredentials, any information required by the credentials implementation will be made available by exposing a function on a wrapped net.Conn passed to the ServerHandshake method.
  • The xds.GRPCServer will be responsible for wrapping the raw net.Conn corresponding to the incoming connection before invoking the ServerHandshake method.

## Abstract

Bring xDS-based configuration to Servers. xDS has many features and this gRFC
will not provide support for any of them in particular. However, it will be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/it will be provide/it will provide/

configure the API, gRPC should log the errors at a default-visible log level. If
the xds bootstrap is missing or invalid, implementations would ideally fail
server startup, but a permanent hang is also acceptable. "Initial configuration"
does not have to include server credentials.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear what this means. Does it mean the initial config received from the control plane may not include TLS configuration but a later one may? Or by "server credentials" you mean the credentials needed to talk to the control plane? In this context "server" is ambiguous because it could mean the xDS control plane or the gRPC xDS-server under question.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up, Eric!

Reviewed 1 of 1 files at r1.
Reviewable status: all files reviewed, 20 unresolved discussions (waiting on @easwars and @ejona86)


A36-xds-for-servers.md, line 57 at r1 (raw file):

of the existing API.

Credential configuration will be managed separately by providing a

This paragraph seems like it might belong in the security gRFC instead of this one. Or maybe leave it here but add something like "see gRFC A29 for details"?


A36-xds-for-servers.md, line 65 at r1 (raw file):

When starting the XdsServer, it will start communication with xDS to receive
initial configuration. `bind()` and `listen()` can happen immediately, but

I think it's worth pointing out that calling listen() before the xDS resource is obtained is actually worse for clients, because even without calling accept(), it will still cause timeout delays before pick_first can fail over to the next address in the list. So I think implementations should prefer to defer calling listen() until after the xDS resource is obtained. That's what we're planning to do in C-core.

Also, the language here seems to imply that bind() should be done immediately even if listen() is delayed. Is there actually any reason for that? Based on what @yashykt and I are looking at in C-core, it seems like the only case where it would matter that bind() happens before we get the xDS resource is a server using an ephemeral port. And that seems like such a niche case that I think I'd be okay saying that we just don't support it for the xDS-enabled server API. (To be clear, it does seem better to call bind() earlier if possible, so that we can support ephemeral ports, but I think it's a nice-to-have, not a hard requirement. In C-core, it looks like that may be hard to de-couple, so at least initially we will probably start by deferring both bind() and listen() until after we have the xDS resource. We can revisit that later if someone actually wants to use ephemeral ports with xDS.)


A36-xds-for-servers.md, line 80 at r1 (raw file):

```json
{
  "grpc_server_resource_name_id": "example/resource",

Given our recent discussion about what we're going to want to do for new-style resource names, how about doing something like this:

  // A template for the name of the Listener resource to subscribe to
  // for a gRPC server.
  // The token "%s", if present in this string, will be replaced with
  // the IP and port on which the server is listening.
  "server_listener_resource_name_template": "grpc/server?xds.resource.listening_address=%s",

This seems more flexible for deployments, and it has the nice property that we'd avoid hard-coding the "xds.resource.listening_address" part.


A36-xds-for-servers.md, line 90 at r1 (raw file):

`envoy.config.listener.v3.Listener` resource. XdsClient will concatenate the
value of `grpc_server_resource_name_id` from the bootstrap,
`?udpa.resource.listening_address=`, and the listening address to construct the

s/udpa/xds/


A36-xds-for-servers.md, line 93 at r1 (raw file):

resource name. For example, with an address of `[::]:80` and the example
bootstrap field above, the resource name would be
`example/resource?udpa.resource.listening_address=[::]:80`. Note that the `[]`

s/udpa/xds/


A36-xds-for-servers.md, line 96 at r1 (raw file):

Previously, sanjaypujare wrote…

There is no listening_address field. It is just encoded in the resource name in the request. So when you match the name in the response to the name in the request you are indirectly verifying the value of the listening_address. In that case it should not be difficult to match that part with the [address][Listener.address] value either.

Hmm, good question. Maybe the XdsClient can verify that the Listener resource has its address field set to the same address as in the resource name. Although that might be difficult if we use the template approach I suggest above instead of hard-coding "xds.resource.listener_address=" in our implementation.


A36-xds-for-servers.md, line 120 at r1 (raw file):

    EXTERNAL = 2;
  }
  google.protobuf.UInt32Value destination_port = 8; // Always fail match

Do we fail to match even if the destination port actually matches the port we're listening on?


A36-xds-for-servers.md, line 125 at r1 (raw file):

  repeated core.v3.CidrRange source_prefix_ranges = 6;
  repeated uint32 source_ports = 7;
  repeated string server_names = 11; // May always fail match if TLS unsupported

The comment here is a little confusing. What's the intended behavior here? (It's not actually entirely clear to me what this field actually does in Envoy.)


A36-xds-for-servers.md, line 126 at r1 (raw file):

  repeated uint32 source_ports = 7;
  repeated string server_names = 11; // May always fail match if TLS unsupported
  string transport_protocol = 9; // Always fail match

Why would this always fail? It looks like the supported values are "raw_buffer" and "tls", to select based on whether the connection is encrypted or not.


A36-xds-for-servers.md, line 127 at r1 (raw file):

  repeated string server_names = 11; // May always fail match if TLS unsupported
  string transport_protocol = 9; // Always fail match
  repeated string application_protocols = 10; // Always fail match

Always fail even if this says "h2"?


A36-xds-for-servers.md, line 131 at r1 (raw file):

All fields are "supported," however, we know that some features are unsupported.

I don't quite understand the meaning of this paragraph.

Should we reject a config that specifies filters, or do we just ignore the filters for now? If we just ignore them for now, then what happens when we later add support for filters? Will we be in a situation where a config that did previously work will then be NACKed?

In particular, I'm wondering about the HTTP connection manager (HCM) filter, which should always be the last filter in the chain. Should we verify that that's there, since we know that we're going to need some functionality from the HCM config later?


A36-xds-for-servers.md, line 141 at r1 (raw file):

no longer in use; they are a resource and must not be leaked.

This overall "wrapping" server API has many language-specific ramifications. We

I wonder if we should actually split the individual language sections out into their own gRFCs with L numbers? We're ultimately going to need a separate section for every wrapped language.


A36-xds-for-servers.md, line 284 at r1 (raw file):

impact is hoped to be low as servers will generally receive their initial
configuration quickly (on the order of a second) and most clients will likely be
using a variant of round-robin. We'd also encourge the xDS control plane to

Why do we encourage the xDS server to avoid handing out addresses of not-yet-started servers? I don't see the rationale for that.

Copy link
Member

@yashykt yashykt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 21 unresolved discussions (waiting on @easwars, @ejona86, and @markdroth)


A36-xds-for-servers.md, line 101 at r1 (raw file):

the resource if the address does not match.

Although `Filter`s will not be observed initially, the `FilterChain` contains

Is this paragraph meant to be a placeholder for future changes? i.e. do we expect the types that we use to change in the future?


A36-xds-for-servers.md, line 103 at r1 (raw file):

Although `Filter`s will not be observed initially, the `FilterChain` contains
data that may be used. When looking for a FilterChain, the standard matching
logic must be used. Each `filter_chain_match` of the repeated

FYI currently, implementation-wise C Core and Go are accepting only a single filter_chain entry, so there is no matching logic in place at the moment. We aren't looking at default_filter_chain either. The only check that we are expecting to perform from the single entry's filter_chain_match is to check whether the application_protocol mentions "h2".

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed, 21 unresolved discussions (waiting on @easwars and @ejona86)


A36-xds-for-servers.md, line 80 at r1 (raw file):

Previously, markdroth (Mark D. Roth) wrote…

Given our recent discussion about what we're going to want to do for new-style resource names, how about doing something like this:

  // A template for the name of the Listener resource to subscribe to
  // for a gRPC server.
  // The token "%s", if present in this string, will be replaced with
  // the IP and port on which the server is listening.
  "server_listener_resource_name_template": "grpc/server?xds.resource.listening_address=%s",

This seems more flexible for deployments, and it has the nice property that we'd avoid hard-coding the "xds.resource.listening_address" part.

Although actually, if we do use the template here, then the parameter name should probably not start with "xds.", since that namespace is reserved for parameters added by the client implementation, not those added by the user.

@ejona86
Copy link
Member Author

ejona86 commented Dec 22, 2020

Responses to @markdroth

So I think implementations should prefer to defer calling listen() until after the xDS resource is obtained.

The very next sentence stated "listen() should be delayed if possible". The rationale section describes the issues listening causes.

Given our other discission, I think this may change to "either don't call listen(), or call listen() and do accept()+close()".

Also, the language here seems to imply that bind() should be done immediately even if listen() is delayed.

It sayed "can". Changed to "may." Java does not bind before receiving the xDS Listener, so that definitely was not the intention. And yes, ephemeral ports won't have a port chosen at that point.

Given our recent discussion about what we're going to want to do for new-style resource names, how about doing something like this: (server_listener_resource_name_template snippet)

Yes, we should change it. Although I'll note that this will break experimental Java users today. But I think we can just add a new experimental flag to the bootstrap and leave the current flag's behavior unchanged. When we remove the grpc_server_resource_name_id flag that's when those clients will break, but they should have migrated to a new experimental flag or the default-on behavior once it is available.

Although actually, if we do use the template here, then the parameter name should probably not start with "xds.", since that namespace is reserved for parameters added by the client implementation, not those added by the user.

It isn't a new-style name so it was already wild-west. But, yeah, good point. I guess with this templating thing we don't need a well-defined name and can just let every server define their own, since now it isn't a protocol between the client and server as it was when this was implicit. CC @sanjaypujare

envoy.config.listener.v3.Listener resource. XdsClient will concatenate the
value of grpc_server_resource_name_id from the bootstrap,
?udpa.resource.listening_address=, and the listening address to construct the

s/udpa/xds/

Looks like Java, and thus probably TD, already used 'udpa'. That can be changed, but "we had all agreed" already to the old name. We can either leave it as-is for the old-style name and improve things for new-style names or change it now even though "it was already decided and agreed on and implemented." I understand how this happened though; this is due to a more clear future direction since we fleshed out a federation design. CC @sanjaypujare

google.protobuf.UInt32Value destination_port = 8; // Always fail match

Do we fail to match even if the destination port actually matches the port we're listening on?

Yes. The Listener's address is what is important for the local port. destination_port is only applicable for use_original_dst. It's documentation: "Optional destination port to consider when use_original_dst is set on the listener in determining a filter chain match."

repeated string server_names = 11; // May always fail match if TLS unsupported

The comment here is a little confusing. What's the intended behavior here? (It's not actually entirely clear to me what this field actually does in Envoy.)

Changed to "Always fail match for non-TLS connections". I think I had intended to let this to have "always fail" logic before XdsServerCredentias is implemented, but I think that actually doesn't help much. This uses SNI to determine what host the client thinks it is connecting to.

string transport_protocol = 9; // Always fail match

Why would this always fail? It looks like the supported values are "raw_buffer" and "tls", to select based on whether the connection is encrypted or not.

It changes based on envoy.filters.listener.tls_inspector; aka connection sniffing. I seriously considered only matching the value "raw_buffer", but the entire thing makes me uneasy as it is really easy to accidentally use wrong.

repeated string application_protocols = 10; // Always fail match

Always fail even if this says "h2"?

Yep. This is more tls_inspector goo; it is only set by that filter (at least the docs say so).

All fields are "supported," however, we know that some features are unsupported.

I don't quite understand the meaning of this paragraph.

Should we reject a config that specifies filters, or do we just ignore the filters for now?

Filters will be ignored initially, just like on client-side. That was at defined at "Although Filters will not be observed initially..."

But this discussion of "fields" only applies to FilterChainMatch.

You, Doug, and me had discussed checking filters and decided to just mirror client-side. Any issues that arise will be no worse than we will already suffer.

impact is hoped to be low as servers will generally receive their initial
configuration quickly (on the order of a second) and most clients will likely be
using a variant of round-robin. We'd also encourge the xDS control plane to

Why do we encourage the xDS server to avoid handing out addresses of not-yet-started servers? I don't see the rationale for that.

In the context of EDS, it is best not to send clients to unhealthy/not ready backends.

@ejona86
Copy link
Member Author

ejona86 commented Dec 22, 2020

Responses to @yashykt

Although Filters will not be observed initially, the FilterChain contains

Is this paragraph meant to be a placeholder for future changes? i.e. do we expect the types that we use to change in the future?

At some point in the future we expect to support filters. But for this gRFC it will not change; adding filter support will be some later effort.

FYI currently, implementation-wise C Core and Go are accepting only a single filter_chain entry, so there is no matching logic in place at the moment. We aren't looking at default_filter_chain either. The only check that we are expecting to perform from the single entry's filter_chain_match is to check whether the application_protocol mentions "h2".

That's fine since that code is not enabled by default. But every part of that behavior would need to be changed/enhanced before GA.


The xDS-returned Listener must have an [`address`][Listener.address] that
matches the listening address provided. The Listener's `address` would be a TCP
`SocketAddress` with matching `address` and `port_value`. The XdsClient must NAK
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Two spellings for the same thing.

Create an `xds.Server` struct that would internally contain a `grpc.Server`. It
would inject its own `StreamServerInterceptor` and `UnaryServerInterceptor`s
into the `grpc.Server`. The interceptors would be controlled by an
`atomic.Value` that would update with the configuration. When
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added in the option for using a mutex. Calling out how the interceptor will roughly work is important to show the design actually works.

TODO: It looks like there is a lifetime issue for the XdsClient, as RPCs can
continue after the listener is closed and Serve() returns.

Because service registration is done on a method in the generated code, and
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will the code snippet below change for xDS? Was a new method introduced to the generated code?

s := xds.NewGRPCServer()
pb.RegisterGreeterServer(s, &server{})

A36-xds-for-servers.md Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good!

Just a couple of comments remaining. One of them is mostly cosmetic (being a bit clearer in terms of how we describe the constraints around bind() and listen()). The other is actually a content change, unless we can think of a better way to do the address check.

Please let me know if you have any questions. Thanks!

A36-xds-for-servers.md Outdated Show resolved Hide resolved
`example/resource?udpa.resource.listening_address=[::]:80`. Note that the `[]`
is _not_ percent encoded, as this string is not a URI.

The xDS-returned Listener must have an [`address`][Listener.address] that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeating my earlier comment from reviewable:

Hmm, good question. Maybe the XdsClient can verify that the Listener resource has its address field set to the same address as in the resource name. Although that might be difficult if we use the template approach I suggest above instead of hard-coding "xds.resource.listener_address=" in our implementation.

We have now switched to the template approach. Given that, I don't think we can make any assumptions about the name field.

I would like to do this validation, but I don't see a reasonable way to do it without some fairly ugly layering violations. Unless one of you can think of another way to do this, I think we may need to live without this check. :(

A36-xds-for-servers.md Outdated Show resolved Hide resolved
A36-xds-for-servers.md Outdated Show resolved Hide resolved
A36-xds-for-servers.md Outdated Show resolved Hide resolved
A36-xds-for-servers.md Outdated Show resolved Hide resolved
A36-xds-for-servers.md Outdated Show resolved Hide resolved
Comment on lines 72 to 73
stall until it succeeds. Each language will provide an XdsServer-specific API to
inform the application of initial communication failures. If the user does not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Java/C have (or plan to have) an API to expose this? I don't think we designed anything like this for Go (yet?). @easwars @yashykt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See XdsIniitListener for Java.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C does not have one yet, but it is in the works.

A36-xds-for-servers.md Outdated Show resolved Hide resolved
Waiting for xds config is not just a one-shot affair at server startup,
since xds can delete resources after started. That caused a pretty
strong rework to the behavior, including ditching
listen()-but-don't-accept(). We still need to validate that
accept()+close() won't cause poor behavior for clients such as Envoy,
but the options available are limited.
is restored. However, XdsServer must accept configuration changes provided by
the xDS server, including resource deletions. If that causes the XdsServer to
lack essential configuration (e.g., the Listener was deleted) the server would
need to enter "not serving" mode. "Not serving" requires existing connections to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about the "Not serving" to "Serving" transition? Specifically:

  • this transition will automatically start processing RPCs without any intervention by the application, correct? i.e. it is as though start() was called on the XdsServer object (but without the application having to call it)
  • this will require recreating the internal/delegate Server object since the delegate does not support restarting after shutting down. Is that correct?

It will be good to add some clarification for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this transition will automatically start processing RPCs without any intervention by the application, correct? i.e. it is as though start() was called on the XdsServer object (but without the application having to call it)

Yes. The application had already called start() as is expecting the server to be running. It was the library that "called stop()" so it is its responsibility to resume and restore the expected application state when able.

this will require recreating the internal/delegate Server object since the delegate does not support restarting after shutting down. Is that correct?

Correct. This will vary with the language-specific API and the approach taken to become "not serving." But assuming XdsServer calls server.stop() on the wrapped server (the internal one), then it'd need to create a new one to replace it when calling start().

I didn't go into more detail here because that is language and implementation-specific. I did mention in the Java section that we may need to rebuild the server, and that was to allow for things here. But it isn't required to re-create the server; I left open the option for you to use a ProtocolNegotiator to immediately close() connections. Such an approach wouldn't have needed to re-create the server.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! I think the only things we need are the changes we've discussed about L4 filters, HTTP filters, and the RouteConfiguration.

Thanks again for driving this!

A36-xds-for-servers.md Outdated Show resolved Hide resolved
A36-xds-for-servers.md Show resolved Hide resolved
A36-xds-for-servers.md Show resolved Hide resolved
A36-xds-for-servers.md Show resolved Hide resolved
message FilterChainMatch {
enum ConnectionSourceType {
ANY = 0;
SAME_IP_OR_LOOPBACK = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the SAME_IP_OR_LOOPBACK, one can easily verify if the source IP of the incoming connection is from a loopback address. But do we also have to support the case where the host supports multiple addresses and we need to make sure that the source IP of the incoming connection is one of them? Also, does anything change based on whether the server is bound to an actual local address or any address?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, SAME_IP_OR_LOOPBACK means the source IP and dest IP are the same (or localhost); it doesn't support cases where a machine has multiple IPs and the IP selection is different between source and dest.

Nothing changes by the wildcard. It should be straight-forward to implement.

A36-xds-for-servers.md Outdated Show resolved Hide resolved
automatically (e.g., retry every minute).

Communication failures do not impact the XdsServer's xDS configuration; the
XdsServer should continue using the most recent configuration until connectivity
Copy link
Contributor

@sanjaypujare sanjaypujare Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that true also for the very first time config is needed during start() (in Java) ? What's the expected behavior when the listener resource is missing at startup time? Should the server's start() fail with the server going into permanent failure state? Or should it go into "Not serving" state and loop until the listener resource is available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically saying "if an I/O error occurs, continue doing what you were doing before." And that applies to startup, because at startup we are "not serving." No, the server's start() should not fail. That is covered by the "XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server" language below. We absolutely must avoid a permanent failure state triggered by transient I/O failures. We'd stay "not serving" until something improves.

@ejona86
Copy link
Member Author

ejona86 commented Feb 25, 2021

I pushed an important filter handling addition; I've mentioned to some of you that was coming. It is yet something else to take care of, but I hope isn't that bad because some client-side work that's already been happening should take care of the harder part.

I made a smattering of fixes for nits. I'm aware some comments still need to be addressed/replied.

@ejona86
Copy link
Member Author

ejona86 commented Feb 25, 2021

If you see my "Make explicit that xds v3 is optional" commit message. Rest assured, it should have said v2. It happens that v3 isn't optional yet :-)

Copy link
Member Author

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything except the question of validating use_original_dst is resolved. use_original_dst will be a trivial change if we choose to make it, so I think this is very close and we should be looking at the final-ish round of reviews.

automatically (e.g., retry every minute).

Communication failures do not impact the XdsServer's xDS configuration; the
XdsServer should continue using the most recent configuration until connectivity
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically saying "if an I/O error occurs, continue doing what you were doing before." And that applies to startup, because at startup we are "not serving." No, the server's start() should not fail. That is covered by the "XdsServer's start must not fail due to transient xDS issues, like missing xDS configuration from the xDS server" language below. We absolutely must avoid a permanent failure state triggered by transient I/O failures. We'd stay "not serving" until something improves.

is restored. However, XdsServer must accept configuration changes provided by
the xDS server, including resource deletions. If that causes the XdsServer to
lack essential configuration (e.g., the Listener was deleted) the server would
need to enter "not serving" mode. "Not serving" requires existing connections to
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this transition will automatically start processing RPCs without any intervention by the application, correct? i.e. it is as though start() was called on the XdsServer object (but without the application having to call it)

Yes. The application had already called start() as is expecting the server to be running. It was the library that "called stop()" so it is its responsibility to resume and restore the expected application state when able.

this will require recreating the internal/delegate Server object since the delegate does not support restarting after shutting down. Is that correct?

Correct. This will vary with the language-specific API and the approach taken to become "not serving." But assuming XdsServer calls server.stop() on the wrapped server (the internal one), then it'd need to create a new one to replace it when calling start().

I didn't go into more detail here because that is language and implementation-specific. I did mention in the Java section that we may need to rebuild the server, and that was to allow for things here. But it isn't required to re-create the server; I left open the option for you to use a ProtocolNegotiator to immediately close() connections. Such an approach wouldn't have needed to re-create the server.

A36-xds-for-servers.md Show resolved Hide resolved
A36-xds-for-servers.md Show resolved Hide resolved
@ejona86 ejona86 requested a review from markdroth February 25, 2021 22:37
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic! Thanks for nailing down all of these details!

A36-xds-for-servers.md Outdated Show resolved Hide resolved
@ejona86 ejona86 merged commit 71f4dea into grpc:master Feb 26, 2021
@ejona86 ejona86 deleted the xds-for-servers branch February 26, 2021 21:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants