Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cache_filter] Separate upstream from downstream #36309

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

ravenblackx
Copy link
Contributor

Commit Message: [cache_filter] Separate upstream from downstream
Additional Description: A recurring issue with cache_filter is that because it uses the regular upstream from a client's connection to populate the cache, a client being slow to download, or disconnecting mid-stream, can cause the cache-populating upstream to be delayed by watermarks or aborted. If the cache were to attempt to stream the results while they're still being populated, this can lead to many clients all receiving an unexpected disconnect or an unnecessarily slow response.
This PR makes the cache filter use a separate upstream made with AsyncHttpClient, so even if the client disconnects the upstream can persist and continue to populate the cache entry.

There are two flaws in this version.

  1. the downstream write and the cache write are still being performed in parallel so if one is slower than the other the slower one's buffer can become oversized in memory
  2. the upstream read has no speed limit so if the cache write is slower than the network read then the cache write buffer can become oversized in memory.

Subsequent PRs are intended to address these:

  1. once the cache provides a "streamed" interface from the cache while the entry is being written, we can use that for the "primary" downstream the same as any other downstream requesting the same resource, rather than streaming from the "shared read" as it does now.
  2. adding an API for the ability to pause and resume an AsyncHttpClient Stream's "read" interface so if the cache-write is a bottleneck the amount of data buffered in memory is capped.
  3. cache filter also needs to monitor downstream watermark callbacks to avoid pushing data faster than the downstream consumes it.

Risk Level: Low, filter is WIP and previously bug-ridden and crash-prone. Integration test passes with no changes after this change.
Testing: Mostly covered by existing tests - one test changed from "expect abort" to "expect cache write completion", as is the intent of this PR.
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #36309 was opened by ravenblackx.

see: more, trace.

Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @adisuissa
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #36309 was opened by ravenblackx.

see: more, trace.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx
Copy link
Contributor Author

/coverage

Copy link

Coverage for this Pull Request will be rendered here:

https://storage.googleapis.com/envoy-pr/36309/coverage/index.html

The coverage results are (re-)rendered each time the CI envoy-presubmit (check linux_x64 coverage) job completes.

🐱

Caused by: a #36309 (comment) was created by @ravenblackx.

see: more, trace.

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
@ravenblackx ravenblackx marked this pull request as ready for review September 24, 2024 22:46
Copy link
Contributor

@adisuissa adisuissa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left a minor API comment.

api/envoy/extensions/filters/http/cache/v3/cache.proto Outdated Show resolved Hide resolved
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>

static LookupStatus resolveLookupStatus(absl::optional<CacheEntryStatus> cache_entry_status,
FilterState filter_state);

private:
class UpstreamRequest : public Http::AsyncClient::StreamCallbacks, public InsertQueueCallbacks {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be improve readability and reviewability if UpstreamRequest had its own .h/.cc files (possibly in an "internal" namespace)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::unique_ptr<CacheInsertQueue> insert_queue_;
// upstream_request_ belongs to the object itself, so that it can be disconnected
// from the filter and still complete the cache-write in the event that the
// downstream disconnects. The filter and the UpstreamRequest must communicate to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good place to mention what CacheFilter functions UpstreamRequest calls when it's about to be destroyed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

class UpstreamRequest : public Http::AsyncClient::StreamCallbacks, public InsertQueueCallbacks {
public:
void sendHeaders(Http::RequestHeaderMap& request_headers);
void disconnectFilter();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can disconnectFilter document that once it returns, UpstreamRequest will issue no more calls to CacheFilter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

FilterState filter_state_;
std::shared_ptr<HttpCache> cache_;
Http::AsyncClient::Stream* stream_ = nullptr;
std::unique_ptr<UpstreamRequest> self_ownership_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of self_ownership_? AFAICT, every self_ownership = nullptr could be replaced with delete this, and the code would be more clear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have linters that aren't big fans of bare pointers, plus I am not a fan of bare pointers (I think the ownership semantics are much clearer with explicitly self_ownership), plus in one of the later iterations it's going to become a smart pointer that isn't owned by itself so it's more convenient to not have it be a bare pointer temporarily and then have to rejigger it. :)

static UpstreamRequest* create(CacheFilter* filter, std::shared_ptr<HttpCache> cache,
Http::AsyncClient& async_client,
const Http::AsyncClient::StreamOptions& options);
UpstreamRequest(CacheFilter* filter, std::shared_ptr<HttpCache> cache,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to have a create method, the constructor should probably be private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's annoying with smart pointers, unfortunately. (Yeah you can do a unique_ptr wrapped around a new instead of a make_unique, I think that's more problematic than a public constructor in what is essentially a private context anyway.)

}
watermarked_ = false;
}
fragments_.clear();
// Clearing self-ownership might provoke the destructor, so take a copy of the
// abort callback to avoid reading from 'this' after it may be deleted.
auto abort_callback = std::move(abort_callback_);
auto callbacks = std::move(callbacks_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems simpler just to call insertQueueAborted before resetting self_ownership_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added more comment explaining why that would be a mistake. (It's likely this is one of the things that will be tidier after shared-cache-streaming responses make the ownership semantics less tangled.)

}
}

void CacheFilter::UpstreamRequest::sendHeaders(Http::RequestHeaderMap& request_headers) {
stream_->sendHeaders(request_headers, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggested comment:
// If this request had a body or trailers, CacheFilter::decodeHeaders would have bypassed cache lookup and insertion, so this class wouldn't be instantiated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void CacheFilter::UpstreamRequest::onHeaders(Http::ResponseHeaderMapPtr&& headers,
bool end_stream) {
if (filter_state_ == FilterState::ValidatingCachedResponse && isResponseNotModified(*headers)) {
return processSuccessfulValidation(std::move(headers));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we get a 304 response while validating an entry, I'm unclear on what the sequence of callbacks will be. We seem to be expecting that we'll get onHeaders then (because we abort()) onReset, and that we won't get onComplete. Is that right? What guarantees do we have?

Copy link
Contributor Author

@ravenblackx ravenblackx Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about this too, and was told the AsyncHttpClient guarantees that no other callbacks are called after reset(). Though hm, I'm not certain if that means onReset doesn't get called either, we may need to call it ourselves. Maybe I should validate this behavior in an integration test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out there is already an integration test that covers this path, and I think it would complain in one of the *san builds if a memory leak was provoked, which would be the effect if onReset didn't get called, so we have a guarantee there. (I also, to be on the safe side, tested in the integration test with some logging to make sure it was behaving as expected.)

lookup_(std::move(filter->lookup_)), is_head_request_(filter->is_head_request_),
request_allows_inserts_(filter->request_allows_inserts_), config_(filter->config_),
filter_state_(filter->filter_state_), cache_(std::move(cache)),
stream_(async_client.start(*this, options)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does async_client.start guarantee that all callbacks will happen on our thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the async clients are inherently thread-local.

}
if (insert_queue_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does insert_queue_ guarantee that all callbacks happen on this thread?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, that's why everything has to be dispatcher-posted. :)

@jmarantz
Copy link
Contributor

jmarantz commented Oct 2, 2024

/wait

Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Signed-off-by: Raven Black <ravenblack@dropbox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants