-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xds: implement extension config discovery for HCM #11826
xds: implement extension config discovery for HCM #11826
Conversation
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
I won't be able to review this before vacation. @htuch can you maybe take a quick pass next week and then I can look the week after? |
Yeah, sure, will look at this early next week. |
@kyessenov before diving into the details, I guess it might be worth discussing whether it's worth having another config-specific management system for FilterConfigDS. I.e. there is a ton of conceptual and implementation complexity in both RDS and FilterConfigDS in terms of how warming, shared configuration, thread-local state and provider/subscriber notions are handled. Can we unify them? I think the one thing that might make some abstraction and reduction of complexity hard is that RDS needs to deal with VHDS and SRDS as well. Nonetheless, it's worrying that we are continuing to expand the amount of config state management complexity when we have these common patterns in RDS and FilterConfigDS. Any thoughts on how to make this better? |
@htuch RDS is mired in the complexity of SRDS and VHDS so it's really hard to reuse it's pieces. The sharing model is also rather strange (shared provider with shared subscriptions), so it would need some refactoring to become reusable. I thought about that, but it's so intertwined that it's hard not to break. The warming bit is interesting since it's not present in any other xDSes, so that's novel. I agree that the majority of the code is just glue, but not sure if it's going to make it simpler to understand if we abstract all the glue into some framework. |
Signed-off-by: Kuat Yessenov <kuat@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Overall, structurally I think this makes sense, but I'd like to see what can be done to make it easier to grok and follow the warming/initialization flow. Flushing some comments.
/wait
* Validate if the route configuration can be applied in the context of the filter manager. | ||
* @param Server::Configuration::NamedHttpFilterConfigFactory a filter factory to validate in the | ||
* context of the filter manager filter chains. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unclear to me what this does. Are you talking about the per-route filter config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this is about filter configuration checked whether it's terminal.
|
||
void DynamicFilterConfigProviderImpl::validateConfig( | ||
Server::Configuration::NamedHttpFilterConfigFactory& factory) { | ||
if (factory.isTerminalFilter() && !require_terminal_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we validating this down here? I'd have thought it would be sufficient to validate the terminal filter condition in HCM..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config is rejected if it cannot be applied. Right now it's just terminal checks, but I imagine there'll be more. We can restructure the code to use callbacks if it's more clear that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure where this additional config validation requirement comes from, in particular as a first-class API feature?
source/extensions/filters/network/http_connection_manager/config.cc
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Please go ahead with the current design. There's a few threads to discuss further while doing so.
/wait
"Server sent a delta FilterConfigDS update attempting to remove a resource (name: " | ||
"{}). Ignoring.", | ||
removed_resources[0]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we discuss previously how delta xDS should look? I think what you have is consistent with RDS/EDS, but I'm also tempted to prefer to make delta xDS behave more consistently with new APIs. WDYT? CC @markdroth
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't discussed. I think for now it's simple since it's just a single resource per subscription. Eventually we might want to combine many resources per xDS request.
source/extensions/filters/network/http_connection_manager/config.cc
Outdated
Show resolved
Hide resolved
Signed-off-by: Kuat Yessenov <kuat@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just need to figure out the validation story.
api/envoy/extensions/filters/network/http_connection_manager/v3/http_connection_manager.proto
Outdated
Show resolved
Hide resolved
// Otherwise, mark ready immediately and start the subscription on initialization. A default | ||
// config is expected in the latter case. | ||
if (!apply_without_warming) { | ||
factory_context.initManager().add(subscription->initTarget()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this server config factory context accurately modeling the listener warming chain? I.e. is this going to ensure listener warming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the context from HCM which I assume is listener context. I think RDS uses the init manager from this context. Let me know if I misunderstand something.
OK, I'm stuck on figuring out why integration test framework flakes for me. I'm seeing delayed connection error 111 on a test request sometimes, and I have no idea why the connection is rejected. |
Integration fiasco might be linked to #11871. Listeners is marked as ready (in both waitUntilListenersReady and the stats), yet the connection is refused. |
Signed-off-by: Kuat Yessenov <kuat@google.com>
Tested a few times with 100 iterations. --runs_per_test=1000 causes time out issues (I guess it's too much load for my 32 cores). |
Does it time out in your test or elsewhere? |
I haven't tried other tests. I just ran it again, and got 2/1000 failures:
It looks like it times out at default timeout which is 10s. Maybe I need to bump it? |
2/1000 is probably OK to merge but unfortunately we have a real issue right now with test flakes and this is bound to fail in super slow CI. cc @alyssawilk for thoughts on ^. |
Thanks. I agree that the flakes should be burnt with fire. I find it very hard to write a non-flaky test in the framework though. I found a way around it. I made HCM do a direct response, and that fixed the flake. There must be another race between the upstream being ready, and the readiness checks I have in my test. |
Somewhat relevant question: is there a way to simulate the integration tests with an out-of-process test driver? I like that the test is very sensitive to timing, but would prefer to use the common agents (which are in golang usually). I think I'd need some way to receive a fast update about internal events (server initialized, metric counter incremented). Currently, we depend on polling which does not allow detecting O(ms) races. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working through the flakes and on this awesome feature!
@kyessenov unfortunately the integration tests are flaking regularly on master. PTAL tomorrow to see if you can fix. Otherwise I will need to revert and I can help you fix if you can't figure it out. Thank you! |
I noticed it fail once locally. Could it be there is a delay between when listener is accepted and applied? I suspect there's some latency between when a filter config provider posts TLS and when a worker picks it up. |
(by listener applied I mean HCM receives the accepted filter config) |
OK, seems like some destruction sequence:
I'll take a look. |
Merged master and it happens a lot more frequently. Must have been some recent changes. |
Let's see if #12268 fixes it. It passed with 1000 iterations locally. |
tags = ["fails_on_windows"], | ||
deps = [ | ||
":http_integration_lib", | ||
"//source/extensions/filters/http/rbac:config", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @yanavlasov
I noticed a breakage during out import from OSS because we disable the rbac extension. The strong dependency on that extension that this change adds breaks compilation in our setup.
I think there's a problem with this test: it depends on an extension but is not using the extension build system macros that remove the test from execution when the relevant extension is removed from the build. But this test is using the rbac extension as an example in order to test a config loading mechanism, not testing the extension itself, so excluding the test would mean that we lose test coverage for the mechanism you introduced.
Should this test hand craft it's own test-only extension and use that to verify that the extension mechanism works as expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do that, although not sure how valuable that is. I was operating under the assumption that RBAC is a core extension. I do need variable behavior for this test to make sense, so not many existing test extensions fit. Feel free to file an issue against me to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically the list of core extensions.
_core_extensions = [ |
If you do need to test specific behavior, please create a test filter with the behavior you need. Other wise these tests will impact others who exclude RBAC filter from their build.
Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: Kevin Baichoo <kbaichoo@google.com>
Signed-off-by: Kuat Yessenov <kuat@google.com> Signed-off-by: chaoqinli <chaoqinli@google.com>
Commit Message:
Implements a filter config discovery client. Modeled after RDS with a few differences:
Remaining items to finish:
One interesting issue with contextual validation is that adding a context may invalidate a resource in xDS (e.g. adding a reference to a filter config as a terminal filter in the filter chain after the non-terminal filter config is received).
Risk Level: low (the code is not activated unless the config is used)
Testing: manual for now
Docs Changes: needed
Release Notes: add a filter config discovery client
Fixes: #7867