Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http2: delay H2 frame serialization if the network connection's output buffer high-watermark is triggered. #14714

Closed
wants to merge 7 commits into from

Conversation

antoniovicente
Copy link
Contributor

Commit Message:
http2: delay H2 frame serialization if the network connection's output buffer high-watermark is
triggered. By avoiding eager serialization the H2 codec implementation is able to prioritize more
important frames so they are sent sooner and provide stricter bounds on connection and stream
buffers.

This change is protected by the envoy.reloadable_features.enable_h2_watermark_improvements runtime
feature which is enabled by default.

Additional Description:
Risk Level: high. significant change to H2 buffering and new dependency on frame queueing functionality in nghttp2
Testing: new integration tests added
Docs Changes: n/a
Release Notes: added
Platform Specific Features: n/a
Issue #11370
Runtime guard: envoy.reloadable_features.enable_h2_watermark_improvements

…t buffer high-watermark is

triggered.  By avoiding eager serialization the H2 codec implementation is able to prioritize more
important frames so they are sent sooner and provide stricter bounds on connection and stream
buffers.

This change is protected by the `envoy.reloadable_features.enable_h2_watermark_improvements` runtime
feature which is enabled by default.

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
Base automatically changed from master to main January 15, 2021 23:02
Copy link
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor nits. Will continue reviewing later today.

// propagated from the network connection if the
// envoy.reloadable_features.enable_h2_watermark_improvements runtime feature is enabled. Both
// HTTP/1.x and HTTP/2 codecs handle this in StreamCallbackHelper::addCallbacksHelper.
// TODO(antoniovicente) For full consistency we need to use the enable_h2_watermark_improvements
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you already do it with the ConnectionImpl::h2_watermark_improvements_ ?

Copy link
Contributor Author

@antoniovicente antoniovicente Jan 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h2_watermark_improvements_ is not used here. Accessing it here seems challenging.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, makes sense

@@ -27,12 +27,30 @@ class TestIoSocketHandle : public IoSocketHandleImpl {
bool socket_v6only = false, absl::optional<int> domain = absl::nullopt)
: IoSocketHandleImpl(fd, socket_v6only, domain), writev_override_(writev_override_proc) {}

void initializeFileEvent(Event::Dispatcher& dispatcher, Event::FileReadyCb cb,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this can be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an implementation of a public interface method.

// Schedule resumption on the IoHandle by posting a callback to the IoHandle's dispatcher. Note
// that this operation is inherently racy, nothing guarantees that the TestIoSocketHandle is not
// deleted before the posted callback executes.
void activateInDispatcherThreadForTest(uint32_t events) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ForTest is redundant since this is test only code.

Signed-off-by: Antonio Vicente <avd@google.com>
Copy link
Contributor

@yanavlasov yanavlasov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome, thanks. A few more comments a a request for some edge case tests.

parent_.connection_.aboveHighWatermark()) {
// The network connection's output buffer is full. Delay generation of the data frame until the
// buffer's size drops to its low-watermark.
return NGHTTP2_ERR_WOULDBLOCK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause control frames to be accumulated inside nghttp2. I could not see a stat counter that tracks how many bytes or frames are pending in nghttp2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have stats counters that track number of frames pending in the current codec implementation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we looking for stats, or just to make sure our current frame flood flame logic was unified so network connection frames and nghttp2 frames both count towards the flood limits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was agreeing with Yan that no such stats counters exist.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my question is if we're losing some of our flooding protection then?
We were using the insta-serialization into data frames to avoid a reflection time OOM attack where someone sent a bunch of requests which would 400 faster than the network drained, queuing outbound header frames. If I understand this PR correctly those header frames will be buffered by nghttp2 and we won't account for them. Doesn't that reopen that attack vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't enforce a byte limit on response headers. What we enforce is a count on number of outstanding control frames. The limit on outstanding control frames is still enforced correctly; we sum the number of frames that are serialized and the number of frames internally queued by nghttp2.

So I think we're not losing any flood protections. Http2BufferWatermarksTest.PingFloodAfterHighWatermark provides test coverage. We could add a similar test with header frames.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do no think we loose any flood protection. The frames that are accumulated by nghttp2 are accounted for using the Nghttp2Session::getOutboundControlFrameQueueSize() method when making flood checks.
I was concerned about the visibility into the size of the queue maintained by nghttp2. We do have counters for buffer sizes (not number of frames) and now we will have some number of bytes sitting in the nghttp2 queue without them being reflected in any stats.

explicit Nghttp2Session(nghttp2_session& session) : session_(session) {}

size_t getOutboundControlFrameQueueSize() const override {
return nghttp2_session_get_outbound_queue_size(&session_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the DATA frame what caused NGHTTP2_ERR_WOULDBLOCK would be added to this queue as well. Probably not a big deal from the frame counting perspective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the data frame that would have blocked is not queued internally. Fairly sure that the data portion of the frame is not added to the queue given the way data frame serialization in ConnectionImpl::StreamImpl::onDataSourceSend; the contents of the data frame is directly moved from the stream buffer to the connection's output buffer. nghttp2 is not involved in that part of the serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, either way it is not a big deal.

ControlFrameFloodLimit + 1);
}

TEST_P(Http2BufferWatermarksTest, RstStreamWhileBlockedProxyingDataFrame) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the same tests for the case where upstream resets stream would be interesting and also stream timeout with pending DATA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low level control of the upstream including manual parsing+serialization of H2 frames seems like a bit of a challenge. We could consider enhancements to tests in a followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably make autonomous upstream reset streams on a specific DATA frame. Timeouts could be harder to test, maybe some special filter can trigger a timeout. Yes we can do it in a follow up.

@yanavlasov
Copy link
Contributor

One additional gotcha here is that nghttp2 queue size is hardcoded to 10K in ConnectionImpl::Http2Options::Http2Options which was ok because it was not likely to be triggered. it now effectively caps the limit on control frames (1K by default). So if someone for some reason wanted this limit to be larger than 10K it would not work. We could either document it or change nghttp2 limit to max_outbound_control_frames + 10K.

Signed-off-by: Antonio Vicente <avd@google.com>
…er merging main

Signed-off-by: Antonio Vicente <avd@google.com>
Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Contributor Author

One additional gotcha here is that nghttp2 queue size is hardcoded to 10K in ConnectionImpl::Http2Options::Http2Options which was ok because it was not likely to be triggered. it now effectively caps the limit on control frames (1K by default). So if someone for some reason wanted this limit to be larger than 10K it would not work. We could either document it or change nghttp2 limit to max_outbound_control_frames + 10K.

I think that the nghttp2 limit can be hit accidentally even without the changes in this PR. Disabled the check by setting the limit to std::numeric_limits<size_t>::max().

@yanavlasov Any other comments?

Signed-off-by: Antonio Vicente <avd@google.com>
@antoniovicente
Copy link
Contributor Author

cc @alyssawilk since it relates to high-watermark behavior

Copy link
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool, though fairly terrifying change.
Did a high level pass today, and I'll do another round next week looking over tests etc.

new_stream->filter_manager_.aboveHighWatermark());

// If the network connection is backed up, the HTTP/1.x stream should be made aware of it on
// creation. In the case of HTTP/2 the stream should be allowed to read up to the configured
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"HTTP/2 and above"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this code would work for HTTP/3

I'm not changing that implementation, it's possible that the ASSERT below will trigger for HTTP/3, we don't seem to have tests that create streams while watermarks are triggered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code is
"use new watermark logic if codec_->protocol() >= Protocol::Http2)
If you don't know the impact on HTPT/3 maybe you should have
codec_->protocol() == Protocol::Http2?

// If the network connection is backed up, the HTTP/1.x stream should be made aware of it on
// creation. In the case of HTTP/2 the stream should be allowed to read up to the configured
// stream limit even when the network connection is backed up, so the readDisable status is not
// propagated from the network connection if the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may get unconfused by the time I get to the bottom of this PR but

What this used to do was inform new streams to bound their reads if the upstream connection was backed up.
Read disabling for HTTP/2 didn't stop it from reading a window's worth of data, it immediately stopped it from sending window updates to ensure that only one windows's worth of data would be read.

I'm not sure how delaying serialization should affect this - I'd think we'd still want to inform the downstream codec to stop sending window updates if upstream is at capacity.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a difference in buffering behavior. I think flow control ends up operating based on the contents of the stream buffer and depend on readDisable when that buffer is above watermark to suppress window updates. I think that we could end up buffering more bytes total per connection, but we'ld end up with stream buffers which are 2x the high configured watermark instead of connection buffers with >100x high watermark bytes in them.

There's some issues that I haven't managed to work out yet, like the configured buffer sizes for the downstream stream buffer affecting how much we download from the upstream over H1 or H2. The issue of H2 stream buffers now playing a role in cases where the other end of the HTTP handler is using H1 could cause some problems related to default values in the config: A downstream H2 connection without explicitly configured defaults or buffer limits would buffer 1MB in the connection's output buffer when proxying data from an H1 upstream. After this change, the buffering would be done based on the H2 stream window, so we'ld buffer 1MB in the connection's output buffer and another 256MB in the stream's buffer since that' the default. This behavior is illustrated by Http2BufferWatermarksTest.DataFlowControlHttp1Upstream

We may want to consider keeping the runtime features associated with this change disabled by default and only consider enabling them after doing several followup fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing that might help this PR is updating flow_control.md to explain what's going on? Maybe that'd help make sure I grokked what you were aiming for and can match it up with what the PR does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for not replying until now. It's been hard for me to find time to continue doing the doc changes required for this.

One thing that we may want to answer is wherever or not the runtime feature knob added in this PR should default to true or false. There are some undesirable behaviors related to buffering when one side of the pipeline is H1 and the other is H2, but the extra buffering does reduce changes for starvation. I need to write a proposal for changes to how H2 pushes the pipeline which would reduce buffering in several interesting cases and address several other issues. If we decide to hold back the runtime feature, should we also hold back documentation? I think we do want new documentation to be ready when we merge functionality with the flag off, commit to getting remaining issues sorted out in the short term and switch over to delayed serialization soon.

parent_.connection_.aboveHighWatermark()) {
// The network connection's output buffer is full. Delay generation of the data frame until the
// buffer's size drops to its low-watermark.
return NGHTTP2_ERR_WOULDBLOCK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were we looking for stats, or just to make sure our current frame flood flame logic was unified so network connection frames and nghttp2 frames both count towards the flood limits?

for (auto& stream : active_streams_) {
stream->runLowWatermarkCallbacks();
if (h2_watermark_improvements_) {
send_pending_frames_cb_->scheduleCallbackNextIteration();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lazy ask: do we have unit tests for 2 backed up sessions both getting "time to write" callbacks? I'd hope that if the first put the buffer over the limit the second wouldn't serialize and it'd be good to have covreage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by session? The send_pending_frames_cb_ is at the connection level.

The mocking of buffers and connection objects in the H2 codec impl test makes it impossible to write small tests that exercise any part of the watermark functionality. In order to write small tests for any of this functionality we would need a significant revamp of that test or rewrite from first principles.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, 2 backed up streams getting "time to write" callbacks from the network. I think that's doable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is already tested in the integration tests like DataFlowControlSymmetricStreamConfig, unless I misunderstood your request.

@yanavlasov
Copy link
Contributor

Overall LGTM. I agree that updating flow control would be good. I'm still confused why behavior of H/2 upstream has changed, I will be pondering this tomorrow.

@@ -109,13 +109,21 @@ class ConnectionImpl : public virtual Connection, protected Logger::Loggable<Log
bool wantsToWrite() override { return nghttp2_session_want_write(session_); }
// Propagate network connection watermark events to each stream on the connection.
void onUnderlyingConnectionAboveWriteBufferHighWatermark() override {
for (auto& stream : active_streams_) {
stream->runHighWatermarkCallbacks();
if (h2_watermark_improvements_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is why H/1 upstream will still continue reading after downstream connection buffer is above watermark?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

@github-actions
Copy link

github-actions bot commented Apr 1, 2021

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 1, 2021
@github-actions
Copy link

github-actions bot commented Apr 9, 2021

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants