xds: implement extension config discovery for HCM #11826

kyessenov · 2020-06-30T20:55:57Z

Commit Message:
Implements a filter config discovery client. Modeled after RDS with a few differences:

subscriptions are shared between unique config providers;
streams are started via one of two targets: the subscription target for a warming case and the provider target for a non-warming case.

Remaining items to finish:

wire stats to the client
unit and integration tests
documentation
(later PR) config dump

One interesting issue with contextual validation is that adding a context may invalidate a resource in xDS (e.g. adding a reference to a filter config as a terminal filter in the filter chain after the non-terminal filter config is received).

Risk Level: low (the code is not activated unless the config is used)
Testing: manual for now
Docs Changes: needed
Release Notes: add a filter config discovery client
Fixes: #7867

Signed-off-by: Kuat Yessenov <kuat@google.com>

…covery

mattklein123 · 2020-07-01T23:11:51Z

I won't be able to review this before vacation. @htuch can you maybe take a quick pass next week and then I can look the week after?

htuch · 2020-07-01T23:18:35Z

Yeah, sure, will look at this early next week.

htuch · 2020-07-06T13:56:39Z

@kyessenov before diving into the details, I guess it might be worth discussing whether it's worth having another config-specific management system for FilterConfigDS. I.e. there is a ton of conceptual and implementation complexity in both RDS and FilterConfigDS in terms of how warming, shared configuration, thread-local state and provider/subscriber notions are handled. Can we unify them?

I think the one thing that might make some abstraction and reduction of complexity hard is that RDS needs to deal with VHDS and SRDS as well. Nonetheless, it's worrying that we are continuing to expand the amount of config state management complexity when we have these common patterns in RDS and FilterConfigDS. Any thoughts on how to make this better?

kyessenov · 2020-07-06T14:24:58Z

@htuch RDS is mired in the complexity of SRDS and VHDS so it's really hard to reuse it's pieces. The sharing model is also rather strange (shared provider with shared subscriptions), so it would need some refactoring to become reusable. I thought about that, but it's so intertwined that it's hard not to break.

The warming bit is interesting since it's not present in any other xDSes, so that's novel.

I agree that the majority of the code is just glue, but not sure if it's going to make it simpler to understand if we abstract all the glue into some framework.

Signed-off-by: Kuat Yessenov <kuat@google.com>

htuch

Thanks. Overall, structurally I think this makes sense, but I'd like to see what can be done to make it easier to grok and follow the warming/initialization flow. Flushing some comments.
/wait

include/envoy/router/filter_config_provider.h

htuch · 2020-07-07T00:26:30Z

include/envoy/router/filter_config_provider.h

+   * Validate if the route configuration can be applied in the context of the filter manager.
+   * @param Server::Configuration::NamedHttpFilterConfigFactory a filter factory to validate in the
+   * context of the filter manager filter chains.
+   */


It's a bit unclear to me what this does. Are you talking about the per-route filter config?

Sorry, this is about filter configuration checked whether it's terminal.

source/common/router/BUILD

htuch · 2020-07-07T00:32:50Z

source/common/router/filter_config_discovery_impl.cc

+
+void DynamicFilterConfigProviderImpl::validateConfig(
+    Server::Configuration::NamedHttpFilterConfigFactory& factory) {
+  if (factory.isTerminalFilter() && !require_terminal_) {


Why are we validating this down here? I'd have thought it would be sufficient to validate the terminal filter condition in HCM..

Config is rejected if it cannot be applied. Right now it's just terminal checks, but I imagine there'll be more. We can restructure the code to use callbacks if it's more clear that way.

I'm still not sure where this additional config validation requirement comes from, in particular as a first-class API feature?

source/common/router/filter_config_discovery_impl.cc

source/common/router/filter_config_discovery_impl.h

source/extensions/filters/network/http_connection_manager/config.cc

Signed-off-by: Kuat Yessenov <kuat@google.com>

htuch

Looks good! Please go ahead with the current design. There's a few threads to discuss further while doing so.
/wait

source/common/router/filter_config_discovery_impl.cc

htuch · 2020-07-08T00:37:01Z

source/common/router/filter_config_discovery_impl.cc

+              "Server sent a delta FilterConfigDS update attempting to remove a resource (name: "
+              "{}). Ignoring.",
+              removed_resources[0]);
+  }


Did we discuss previously how delta xDS should look? I think what you have is consistent with RDS/EDS, but I'm also tempted to prefer to make delta xDS behave more consistently with new APIs. WDYT? CC @markdroth

We haven't discussed. I think for now it's simple since it's just a single resource per subscription. Eventually we might want to combine many resources per xDS request.

source/common/router/filter_config_discovery_impl.cc

source/extensions/filters/network/http_connection_manager/config.cc

Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only · 2020-07-08T23:31:25Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #11826 was synchronize by kyessenov.

see: more, trace.

htuch

Looks good, just need to figure out the validation story.

api/envoy/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto

htuch · 2020-07-09T18:25:15Z

source/common/filter/filter_config_discovery_impl.cc

+  // Otherwise, mark ready immediately and start the subscription on initialization. A default
+  // config is expected in the latter case.
+  if (!apply_without_warming) {
+    factory_context.initManager().add(subscription->initTarget());


Is this server config factory context accurately modeling the listener warming chain? I.e. is this going to ensure listener warming.

This is the context from HCM which I assume is listener context. I think RDS uses the init manager from this context. Let me know if I misunderstand something.

kyessenov · 2020-07-22T20:07:42Z

OK, I'm stuck on figuring out why integration test framework flakes for me. I'm seeing delayed connection error 111 on a test request sometimes, and I have no idea why the connection is rejected.

kyessenov · 2020-07-23T06:38:58Z

Integration fiasco might be linked to #11871. Listeners is marked as ready (in both waitUntilListenersReady and the stats), yet the connection is refused.

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2020-07-23T17:08:43Z

Tested a few times with 100 iterations. --runs_per_test=1000 causes time out issues (I guess it's too much load for my 32 cores).

mattklein123 · 2020-07-23T17:10:18Z

Does it time out in your test or elsewhere?

kyessenov · 2020-07-23T17:30:52Z

I haven't tried other tests. I just ran it again, and got 2/1000 failures:

[2020-07-23 17:22:47.059][12][critical][assert] [test/integration/http_integration.cc:389] assert failure: result. Details: Timed out waiting for new stream.
[2020-07-23 17:22:47.148][12][critical][backtrace] [bazel-out/k8-fastbuild/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::HttpIntegrationTest::waitForNextUpstreamRequest() [0x4faf00]

It looks like it times out at default timeout which is 10s. Maybe I need to bump it?

mattklein123 · 2020-07-23T17:32:50Z

2/1000 is probably OK to merge but unfortunately we have a real issue right now with test flakes and this is bound to fail in super slow CI. cc @alyssawilk for thoughts on ^.

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov · 2020-07-23T17:46:16Z

Thanks. I agree that the flakes should be burnt with fire. I find it very hard to write a non-flaky test in the framework though.

I found a way around it. I made HCM do a direct response, and that fixed the flake. There must be another race between the upstream being ready, and the readiness checks I have in my test.

kyessenov · 2020-07-23T17:57:13Z

Somewhat relevant question: is there a way to simulate the integration tests with an out-of-process test driver? I like that the test is very sensitive to timing, but would prefer to use the common agents (which are in golang usually). I think I'd need some way to receive a fast update about internal events (server initialized, metric counter incremented). Currently, we depend on polling which does not allow detecting O(ms) races.

mattklein123

Thanks for working through the flakes and on this awesome feature!

mattklein123 · 2020-07-24T06:06:49Z

@kyessenov unfortunately the integration tests are flaking regularly on master. PTAL tomorrow to see if you can fix. Otherwise I will need to revert and I can help you fix if you can't figure it out. Thank you!

kyessenov · 2020-07-24T06:14:45Z

I noticed it fail once locally. Could it be there is a delay between when listener is accepted and applied? I suspect there's some latency between when a filter config provider posts TLS and when a worker picks it up.

kyessenov · 2020-07-24T06:17:51Z

(by listener applied I mean HCM receives the accepted filter config)

kyessenov · 2020-07-24T06:26:56Z

OK, seems like some destruction sequence:


[2020-07-24 05:22:24.959][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #1: Envoy::Config::GrpcMuxImpl::queueDiscoveryRequest() [0xf32183]
[2020-07-24 05:22:24.962][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #2: Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() [0xf35141]
[2020-07-24 05:22:24.965][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #3: Envoy::Config::GrpcMuxImpl::GrpcMuxWatchImpl::~GrpcMuxWatchImpl() [0xf3519e]
[2020-07-24 05:22:24.968][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #4: Envoy::Config::GrpcSubscriptionImpl::~GrpcSubscriptionImpl() [0xf3092a]
[2020-07-24 05:22:24.971][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #5: Envoy::Filter::Http::FilterConfigSubscription::~FilterConfigSubscription() [0xe7235b]
[2020-07-24 05:22:24.974][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #6: Envoy::Filter::Http::DynamicFilterConfigProviderImpl::~DynamicFilterConfigProviderImpl() [0xe70416]
[2020-07-24 05:22:24.977][34][critical][backtrace] [bazel-out/k8-opt/bin/source/server/_virtual_includes/backtrace_lib/server/backtrace.h:96] #7: Envoy::Filter::Http::DynamicFilterConfigProviderImpl::~DynamicFilterConfigProviderImpl() [0xe705ee]

I'll take a look.

kyessenov · 2020-07-24T06:31:05Z

Merged master and it happens a lot more frequently. Must have been some recent changes.

kyessenov · 2020-07-24T06:46:04Z

Let's see if #12268 fixes it. It passed with 1000 iterations locally.

antoniovicente · 2020-07-27T22:20:57Z

test/integration/BUILD

+    tags = ["fails_on_windows"],
+    deps = [
+        ":http_integration_lib",
+        "//source/extensions/filters/http/rbac:config",


cc @yanavlasov

I noticed a breakage during out import from OSS because we disable the rbac extension. The strong dependency on that extension that this change adds breaks compilation in our setup.

I think there's a problem with this test: it depends on an extension but is not using the extension build system macros that remove the test from execution when the relevant extension is removed from the build. But this test is using the rbac extension as an example in order to test a config loading mechanism, not testing the extension itself, so excluding the test would mean that we lose test coverage for the mechanism you introduced.

Should this test hand craft it's own test-only extension and use that to verify that the extension mechanism works as expected?

I could do that, although not sure how valuable that is. I was operating under the assumption that RBAC is a core extension. I do need variable behavior for this test to make sense, so not many existing test extensions fit. Feel free to file an issue against me to clean this up.

This is technically the list of core extensions.

envoy/source/extensions/all_extensions.bzl

Line 19 in dbbcc69

_core_extensions = [

If you do need to test specific behavior, please create a test filter with the behavior you need. Other wise these tests will impact others who exclude RBAC filter from their build.

Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>

Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: chaoqinli <chaoqinli@google.com>

kyessenov added 3 commits June 26, 2020 09:27

initial work

94a74e6

Signed-off-by: Kuat Yessenov <kuat@google.com>

iteration

618eb78

Signed-off-by: Kuat Yessenov <kuat@google.com>

finish and test

63bad69

Signed-off-by: Kuat Yessenov <kuat@google.com>

kyessenov requested review from alyssawilk and mattklein123 as code owners June 30, 2020 20:55

mattklein123 self-assigned this Jun 30, 2020

kyessenov added 4 commits June 30, 2020 13:58

comments

4305ecd

Signed-off-by: Kuat Yessenov <kuat@google.com>

reformat and kick ci

760eba4

Signed-off-by: Kuat Yessenov <kuat@google.com>

fix tests

98c4048

Signed-off-by: Kuat Yessenov <kuat@google.com>

Merge remote-tracking branch 'upstream/master' into filter_config_dis…

a620c4b

…covery

mattklein123 assigned htuch Jul 1, 2020

merge upstream

600115c

Signed-off-by: Kuat Yessenov <kuat@google.com>

htuch suggested changes Jul 7, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Jul 7, 2020

review

2324038

Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only bot removed the waiting label Jul 7, 2020

htuch suggested changes Jul 8, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Jul 8, 2020

kyessenov added 4 commits July 8, 2020 09:03

merge fix

ab7b803

Signed-off-by: Kuat Yessenov <kuat@google.com>

update api definitions

c5bcc15

Signed-off-by: Kuat Yessenov <kuat@google.com>

generalize

548e27c

Signed-off-by: Kuat Yessenov <kuat@google.com>

review

2228e1e

Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only bot added api and removed waiting labels Jul 8, 2020

htuch reviewed Jul 9, 2020

View reviewed changes

kyessenov mentioned this pull request Jul 23, 2020

Incorrect listener readiness reporting #12245

Open

thanks @lambdai for providing the magic stat

ae4f612

Signed-off-by: Kuat Yessenov <kuat@google.com>

repokitteh-read-only bot removed the waiting label Jul 23, 2020

fix

67093a1

Signed-off-by: Kuat Yessenov <kuat@google.com>

mattklein123 approved these changes Jul 23, 2020

View reviewed changes

repokitteh-read-only bot removed the api label Jul 23, 2020

mattklein123 merged commit e8216a8 into envoyproxy:master Jul 23, 2020

antoniovicente reviewed Jul 27, 2020

View reviewed changes

This was referenced Jul 28, 2020

Add CI to verify that envoy builds and works correctly when most extensions are disabled. #12314

Closed

//test/integration:extension_discovery_integration_test unconditionally depends on RBAC extension #12315

Closed

KBaichoo pushed a commit to KBaichoo/envoy that referenced this pull request Jul 30, 2020

xds: implement extension config discovery for HCM (envoyproxy#11826)

8250780

Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>

chaoqin-li1123 pushed a commit to chaoqin-li1123/envoy that referenced this pull request Aug 7, 2020

xds: implement extension config discovery for HCM (envoyproxy#11826)

c51b791

Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: chaoqinli <chaoqinli@google.com>

yanjunxiang-google mentioned this pull request Feb 18, 2022

ecds: add support for listener filters #20049

Closed

soulxu mentioned this pull request Apr 27, 2022

listener: support multiple addresses #19367

Closed

xds: implement extension config discovery for HCM #11826

xds: implement extension config discovery for HCM #11826

Conversation

kyessenov commented Jun 30, 2020 • edited Loading

mattklein123 commented Jul 1, 2020

htuch commented Jul 1, 2020

htuch commented Jul 6, 2020 • edited Loading

kyessenov commented Jul 6, 2020

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

repokitteh-read-only bot commented Jul 8, 2020

htuch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyessenov commented Jul 22, 2020

kyessenov commented Jul 23, 2020

kyessenov commented Jul 23, 2020

mattklein123 commented Jul 23, 2020

kyessenov commented Jul 23, 2020

mattklein123 commented Jul 23, 2020

kyessenov commented Jul 23, 2020

kyessenov commented Jul 23, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

mattklein123 commented Jul 24, 2020

kyessenov commented Jul 24, 2020

kyessenov commented Jul 24, 2020

kyessenov commented Jul 24, 2020

kyessenov commented Jul 24, 2020

kyessenov commented Jul 24, 2020

Choose a reason for hiding this comment

kyessenov Jul 27, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kyessenov commented Jun 30, 2020 •

edited

Loading

htuch commented Jul 6, 2020 •

edited

Loading

kyessenov Jul 27, 2020 •

edited

Loading