Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caching: Support out-of-thread (async) HttpCache implementations #12622

Merged
merged 15 commits into from
Aug 24, 2020

Conversation

yosrym93
Copy link
Contributor

Commit Message:
Added support to out-of-thread (async) implementations of HttpCache.
Signed-off-by: Yosry Ahmed yosryahmed@google.com

Additional Description:

  • Cache callbacks are now posted to the dispatcher to make sure they are run on the worker thread.
  • Simplified the CacheFilter state management and tests as all caches are now treated in the same way.

Risk Level: Low
Testing: Updated existing tests
Docs Changes: N/A
Release Notes: N/A

…thread HttpCache implementations

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
…ntext

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@@ -32,8 +33,12 @@ CacheFilter::CacheFilter(const envoy::extensions::filters::http::cache::v3alpha:
: time_source_(time_source), cache_(http_cache) {}

void CacheFilter::onDestroy() {
lookup_ = nullptr;
insert_ = nullptr;
if (lookup_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to calling onDestroy on lookup_ and insert_, we also need to remember that our onDestroy_ has been called. They might have posted callbacks that haven't happened yet; when we get them, we need to ignore them. I suggest adding a new FilterState for this. (Another approach would be to hand these pointers to dispatcher().deferredDelete, but making it a state seems cleaner.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 This PR relies on details of event handling that aren't documented, but that I think are intended. Specifically, it assumes that all callbacks posted before the return of CacheFilter::onDestroy will be executed before this CacheFilter is deleted. (If this isn't intended, then I don't understand the purpose of deferred deletion.) Can you confirm? (Either way, we should update the docs.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tracked this PR, but no, this is not guaranteed. Can you describe a bit more what is going on or do you want me to review the entire PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of CacheFilter's plugin cache implementations will complete requests asynchronously, and there's no guarantee of what thread those completions will run on, so their callbacks get posted to our dispatcher. Suppose a plugin posts a callback, then (perhaps due to a client disconnect) CacheFilter::onDestroy is called. That callback must happen before the CacheFilter gets deleted (or else it will access a deleted CacheFilter). AFAICT, the current code does give us the correct order (because it runs pending callbacks before it runs deferred deletion), and I'll be surprised if it ever needs to change, but we need to make sure.

I think that all we need to do is add documentation and tests to lock in the current behavior of running pending callbacks before deferred deletion.

If we can't rely on this, we'll have to go back to a prior iteration of the original CacheFilter PR, and use either weak_ptrs or shared_ptrs to the CacheFilter, but that complicates the code and adds bus-locked operations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution here is to not use post, but use a "timer" which can be cancelled. Can we switch to that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarantz the problem arises if the cache lookup has already finished and posted the callback to the dispatcher, but then the client closes the connection and the filter chain is destroyed. Currently, there's no guarantee that the posted callback will run before the filter is deleted. If the filter is deleted first, the posted callback will run on a destroyed filter.
One solution here is to capture a weak_ptr to the CacheFilter in the posted callback, and check that the CacheFilter is still alive before accessing any of its members/methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we arrange a destroy-notify callback from the filter chain? The cache system could then mark the filter structure as stale and avoid referencing it further.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course weak_ptr does the job too, if the thing being pointed to is already a shared_ptr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more thing: the advantage of a callback over a weak_ptr is that you might be able to actively cancel an outstanding lookup request (if that is possible).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 I am following the discussion at #12364. Can you elaborate how is it related to our problem here?

It's loosely related in the sense that we have TLS post/callbacks that are being running beyond the acceptable lifetime.

one more thing: the advantage of a callback over a weak_ptr is that you might be able to actively cancel an outstanding lookup request (if that is possible).

This is how a cancellable post would be accomplished within dispatcher. This is what I recommend if you want a core solution, otherwise you can implement this yourself.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
…ered while the filter is being destroyed.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@mattklein123 mattklein123 self-assigned this Aug 17, 2020
@mattklein123
Copy link
Member

I assigned myself so can help review the entire PR once we sort out the implementation plan.

/wait

@toddmgreer
Copy link
Contributor

toddmgreer commented Aug 18, 2020 via email

…r being deleted, in the case where the cache callback is posted to the dispatcher then the filter is deleted before the callback is executed

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@yosrym93
Copy link
Contributor Author

I added safeguard against the "callback posted then filter deleted" case we have been discussing by using a weak_ptr. I tried to make it as simple as possible. I added a TODO to look into other solutions as they arise (guaranteed dispatcher ordering of posts and deletions, cancellable posts, etc.).
I also added a test for this that fails if the filter is accessed after being deleted when run with the ASAN sanitizer (otherwise the behavior is naturally undefined - it may give false positives).
Looking forward for everyone to take a look!

source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks generally LGTM but a few small bugs/questions. Thank you!

/wait

source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/cache_filter.cc Outdated Show resolved Hide resolved
source/extensions/filters/http/cache/http_cache.h Outdated Show resolved Hide resolved
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
toddmgreer
toddmgreer previously approved these changes Aug 21, 2020
source/extensions/filters/http/cache/http_cache.h Outdated Show resolved Hide resolved
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
toddmgreer
toddmgreer previously approved these changes Aug 21, 2020
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
@yosrym93
Copy link
Contributor Author

@mattklein123 I fixed the test. Turns out the lookup in the test did not find any cache entries so headers was an empty unique_ptr, that's why there were no leaks. I fixed the test and now it fails when the pointer is wrapped inside the if condition.

@mattklein123
Copy link
Member

@mattklein123 I fixed the test. Turns out the lookup in the test did not find any cache entries so headers was an empty unique_ptr, that's why there were no leaks. I fixed the test and now it fails when the pointer is wrapped inside the if condition.

Great!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mattklein123 mattklein123 merged commit 0356108 into envoyproxy:master Aug 24, 2020
@yosrym93 yosrym93 deleted the async branch August 24, 2020 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants