caching: Support out-of-thread (async) HttpCache implementations #12622

yosrym93 · 2020-08-12T23:10:07Z

Commit Message:
Added support to out-of-thread (async) implementations of HttpCache.
Signed-off-by: Yosry Ahmed yosryahmed@google.com

Additional Description:

Cache callbacks are now posted to the dispatcher to make sure they are run on the worker thread.
Simplified the CacheFilter state management and tests as all caches are now treated in the same way.

Risk Level: Low
Testing: Updated existing tests
Docs Changes: N/A
Release Notes: N/A

…thread HttpCache implementations Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

…ntext Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

source/extensions/filters/http/cache/cache_filter.cc

test/extensions/filters/http/cache/cache_filter_test.cc

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

source/extensions/filters/http/cache/cache_filter.cc

toddmgreer · 2020-08-17T20:27:51Z

source/extensions/filters/http/cache/cache_filter.cc

@@ -32,8 +33,12 @@ CacheFilter::CacheFilter(const envoy::extensions::filters::http::cache::v3alpha:
    : time_source_(time_source), cache_(http_cache) {}

 void CacheFilter::onDestroy() {
-  lookup_ = nullptr;
-  insert_ = nullptr;
+  if (lookup_) {


In addition to calling onDestroy on lookup_ and insert_, we also need to remember that our onDestroy_ has been called. They might have posted callbacks that haven't happened yet; when we get them, we need to ignore them. I suggest adding a new FilterState for this. (Another approach would be to hand these pointers to dispatcher().deferredDelete, but making it a state seems cleaner.)

@mattklein123 This PR relies on details of event handling that aren't documented, but that I think are intended. Specifically, it assumes that all callbacks posted before the return of CacheFilter::onDestroy will be executed before this CacheFilter is deleted. (If this isn't intended, then I don't understand the purpose of deferred deletion.) Can you confirm? (Either way, we should update the docs.)

I haven't tracked this PR, but no, this is not guaranteed. Can you describe a bit more what is going on or do you want me to review the entire PR?

Most of CacheFilter's plugin cache implementations will complete requests asynchronously, and there's no guarantee of what thread those completions will run on, so their callbacks get posted to our dispatcher. Suppose a plugin posts a callback, then (perhaps due to a client disconnect) CacheFilter::onDestroy is called. That callback must happen before the CacheFilter gets deleted (or else it will access a deleted CacheFilter). AFAICT, the current code does give us the correct order (because it runs pending callbacks before it runs deferred deletion), and I'll be surprised if it ever needs to change, but we need to make sure.

I think that all we need to do is add documentation and tests to lock in the current behavior of running pending callbacks before deferred deletion.

If we can't rely on this, we'll have to go back to a prior iteration of the original CacheFilter PR, and use either weak_ptrs or shared_ptrs to the CacheFilter, but that complicates the code and adds bus-locked operations.

The solution here is to not use post, but use a "timer" which can be cancelled. Can we switch to that?

@jmarantz the problem arises if the cache lookup has already finished and posted the callback to the dispatcher, but then the client closes the connection and the filter chain is destroyed. Currently, there's no guarantee that the posted callback will run before the filter is deleted. If the filter is deleted first, the posted callback will run on a destroyed filter.
One solution here is to capture a weak_ptr to the CacheFilter in the posted callback, and check that the CacheFilter is still alive before accessing any of its members/methods.

Can we arrange a destroy-notify callback from the filter chain? The cache system could then mark the filter structure as stale and avoid referencing it further.

of course weak_ptr does the job too, if the thing being pointed to is already a shared_ptr.

one more thing: the advantage of a callback over a weak_ptr is that you might be able to actively cancel an outstanding lookup request (if that is possible).

@mattklein123 I am following the discussion at #12364. Can you elaborate how is it related to our problem here?

It's loosely related in the sense that we have TLS post/callbacks that are being running beyond the acceptable lifetime.

one more thing: the advantage of a callback over a weak_ptr is that you might be able to actively cancel an outstanding lookup request (if that is possible).

This is how a cancellable post would be accomplished within dispatcher. This is what I recommend if you want a core solution, otherwise you can implement this yourself.

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

…ered while the filter is being destroyed. Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

mattklein123 · 2020-08-17T23:42:49Z

I assigned myself so can help review the entire PR once we sort out the implementation plan.

/wait

toddmgreer · 2020-08-18T23:07:37Z

Calling onDestroy on lookup_ and insert_ takes care of the cancellation callback. Holding a weak_ptr<CacheFilter> in the callback lambda takes care of CacheFilter getting deleted before the post happens. (See PR #7198 for an earlier state of this approach.)

…

On Tue, Aug 18, 2020 at 1:49 PM Matt Klein ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In source/extensions/filters/http/cache/cache_filter.cc <#12622 (comment)>: > @@ -32,8 +33,12 @@ CacheFilter::CacheFilter(const envoy::extensions::filters::http::cache::v3alpha: : time_source_(time_source), cache_(http_cache) {} void CacheFilter::onDestroy() { - lookup_ = nullptr; - insert_ = nullptr; + if (lookup_) { @mattklein123 <https://github.com/mattklein123> I am following the discussion at #12364 <#12364>. Can you elaborate how is it related to our problem here? It's loosely related in the sense that we have TLS post/callbacks that are being running beyond the acceptable lifetime. one more thing: the advantage of a callback over a weak_ptr is that you might be able to actively cancel an outstanding lookup request (if that is possible). This is how a cancellable post would be accomplished within dispatcher. This is what I recommend if you want a core solution, otherwise you can implement this yourself. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#12622 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFRAWPPRY7CMV7FFNGKPGF3SBLSLXANCNFSM4P5NVFGQ> .

…r being deleted, in the case where the cache callback is posted to the dispatcher then the filter is deleted before the callback is executed Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

yosrym93 · 2020-08-19T00:01:45Z

I added safeguard against the "callback posted then filter deleted" case we have been discussing by using a weak_ptr. I tried to make it as simple as possible. I added a TODO to look into other solutions as they arise (guaranteed dispatcher ordering of posts and deletions, cancellable posts, etc.).
I also added a test for this that fails if the filter is accessed after being deleted when run with the ASAN sanitizer (otherwise the behavior is naturally undefined - it may give false positives).
Looking forward for everyone to take a look!

source/extensions/filters/http/cache/cache_filter.cc

mattklein123

Thanks generally LGTM but a few small bugs/questions. Thank you!

/wait

source/extensions/filters/http/cache/cache_filter.cc

source/extensions/filters/http/cache/http_cache.h

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

source/extensions/filters/http/cache/http_cache.h

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

yosrym93 · 2020-08-24T17:36:43Z

@mattklein123 I fixed the test. Turns out the lookup in the test did not find any cache entries so headers was an empty unique_ptr, that's why there were no leaks. I fixed the test and now it fails when the pointer is wrapped inside the if condition.

mattklein123 · 2020-08-24T17:59:28Z

@mattklein123 I fixed the test. Turns out the lookup in the test did not find any cache entries so headers was an empty unique_ptr, that's why there were no leaks. I fixed the test and now it fails when the pointer is wrapped inside the if condition.

Great!

mattklein123

Thanks!

yosrym93 added 4 commits August 12, 2020 22:51

Modified cache callbacks to post to the dispatcher to support out-of-…

6fa7f4b

…thread HttpCache implementations Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Simplified the CacheFilter state management and tests

f38f940

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Use onDestroy to terminate async events in LookupContext and InsertCo…

1a90502

…ntext Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Added comments to improve readability

926eb95

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

yosrym93 requested a review from jmarantz as a code owner August 12, 2020 23:10

Added comments to improve readability

d79e352

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

toddmgreer suggested changes Aug 13, 2020

View reviewed changes

Readability and style improvements

502be9b

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

mattklein123 assigned jmarantz Aug 14, 2020

jmarantz assigned toddmgreer Aug 17, 2020

toddmgreer suggested changes Aug 17, 2020

View reviewed changes

yosrym93 added 3 commits August 17, 2020 21:04

Removed outdated precondition comment

b7e1356

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Added a destroyed filter state to ignore any callbacks that are trigg…

6fd5aa8

…ered while the filter is being destroyed. Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

Added a comment for readability

cdc8d35

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

mattklein123 self-assigned this Aug 17, 2020

repokitteh-read-only bot added the waiting label Aug 17, 2020

Make sure the CacheFilter is not accessed in the cache callbacks afte…

fb0b01d

…r being deleted, in the case where the cache callback is posted to the dispatcher then the filter is deleted before the callback is executed Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

repokitteh-read-only bot removed the waiting label Aug 18, 2020

toddmgreer suggested changes Aug 20, 2020

View reviewed changes

mattklein123 requested changes Aug 20, 2020

View reviewed changes

repokitteh-read-only bot added the waiting label Aug 20, 2020

Avoid a potential memory leak, readability improvements

f9c7d4f

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

repokitteh-read-only bot removed the waiting label Aug 20, 2020

mattklein123 added the waiting label Aug 20, 2020

Fix format

8cc9d80

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

repokitteh-read-only bot removed the waiting label Aug 20, 2020

Merge remote-tracking branch 'upstream/master' into async

3766df3

yosrym93 requested a review from mattklein123 August 21, 2020 17:53

toddmgreer previously approved these changes Aug 21, 2020

View reviewed changes

source/extensions/filters/http/cache/http_cache.h Outdated Show resolved Hide resolved

Updated onDestory documentation to clarify its need

4e1aea5

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

yosrym93 dismissed toddmgreer’s stale review via 4e1aea5 August 21, 2020 19:38

toddmgreer previously approved these changes Aug 21, 2020

View reviewed changes

Fixed a test to correctly check for memory leaks in the cache callbacks

13c84f9

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>

yosrym93 dismissed toddmgreer’s stale review via 13c84f9 August 24, 2020 17:33

mattklein123 approved these changes Aug 24, 2020

View reviewed changes

mattklein123 merged commit 0356108 into envoyproxy:master Aug 24, 2020

yosrym93 deleted the async branch August 24, 2020 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

caching: Support out-of-thread (async) HttpCache implementations #12622

caching: Support out-of-thread (async) HttpCache implementations #12622

yosrym93 commented Aug 12, 2020

toddmgreer Aug 17, 2020

toddmgreer Aug 17, 2020

mattklein123 Aug 17, 2020

toddmgreer Aug 17, 2020

mattklein123 Aug 17, 2020

yosrym93 Aug 18, 2020

jmarantz Aug 18, 2020

jmarantz Aug 18, 2020

jmarantz Aug 18, 2020

mattklein123 Aug 18, 2020

mattklein123 commented Aug 17, 2020

toddmgreer commented Aug 18, 2020 via email

yosrym93 commented Aug 19, 2020

mattklein123 left a comment

yosrym93 commented Aug 24, 2020

mattklein123 commented Aug 24, 2020

mattklein123 left a comment

caching: Support out-of-thread (async) HttpCache implementations #12622

caching: Support out-of-thread (async) HttpCache implementations #12622

Conversation

yosrym93 commented Aug 12, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 commented Aug 17, 2020

toddmgreer commented Aug 18, 2020 via email

yosrym93 commented Aug 19, 2020

mattklein123 left a comment

Choose a reason for hiding this comment

yosrym93 commented Aug 24, 2020

mattklein123 commented Aug 24, 2020

mattklein123 left a comment

Choose a reason for hiding this comment