Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dynamic HTTP/2 window scaling #54755

Merged
merged 40 commits into from
Jul 8, 2021

Conversation

antonfirsov
Copy link
Member

@antonfirsov antonfirsov commented Jun 25, 2021

Fixes #43086 by introducing automatic scaling of the HTTP/2 stream receive window based on measuring RTT of PING frames.

The algorithm has been tested and benchmarked against various bandwidth-RTT combos. The HTTP/2 download speed still does not match HTTP/1, but this is also true for a large static window, we need to do a separate investigation on this.

The PR introduces 3 tuning options as environment variables/AppContext switches:

  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_HTTP2FLOWCONTROL_DISABLEDYNAMICWINDOWSIZING to disable the dynamic window alogithm and the RTT PING
  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_MAXSTREAMWINDOWSIZE to control the maximum stream window size
  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_STREAMWINDOWSCALETHRESHOLDMULTIPLIER to enable fine-tuning the balance between optimal receive window VS download speed

Resolves #53372.

I'm still looking at a failing test (ResponseStreamFrames_DataAfterHeadersAndContinuationWithoutEndHeaders_ConnectionError) and I want to re-validate in an environment with actual physical (non-simulated) delay, but generally the PR is ready for review.

@geoffkizer I addressed the naming concerns you raised in #52862, hope it looks better now.

/cc @ManickaP @halter73 @JamesNK @brporter

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Jun 25, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Fixes #43086 by introducing automatic scaling of the HTTP/2 stream receive window based on measuring RTT of PING frames.

The algorithm has been tested and benchmarked against various bandwidth-RTT combos. The HTTP/2 download speed still does not match HTTP/1, but this is also true for a large static window, we need to do a separate investigation on this.

The PR introduces 3 tuning options as environment variables/AppContext switches:

  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_HTTP2FLOWCONTROL_DISABLEDYNAMICWINDOWSIZING to disable the dynamic window alogithm and the RTT PING
  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_MAXSTREAMWINDOWSIZE to control the maximum stream window size
  • DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_STREAMWINDOWSCALETHRESHOLDMULTIPLIER to enable fine-tuning the balance between optimal receive window VS download speed

Resolves #53372.

I'm still looking at a failing test (ResponseStreamFrames_DataAfterHeadersAndContinuationWithoutEndHeaders_ConnectionError) and I want to re-validate in an environment with actual physical (non-simulated) delay, but generally the PR is ready for review.

/cc @geoffkizer @ManickaP @halter73 @JamesNK @brporter

Author: antonfirsov
Assignees: -
Labels:

area-System.Net.Http, new-api-needs-documentation

Milestone: -

@antonfirsov

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@antonfirsov

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@geoffkizer
Copy link
Contributor

I hesitate a little bit to expose DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_MAXSTREAMWINDOWSIZE and DOTNET_SYSTEM_NET_HTTP_SOCKETSHTTPHANDLER_FLOWCONTROL_STREAMWINDOWSCALETHRESHOLDMULTIPLIER because once we expose them, someone will depend on them and it will be nearly impossible to remove these.

How important do we think these really are?

@antonfirsov
Copy link
Member Author

antonfirsov commented Jun 28, 2021

someone will depend on them and it will be nearly impossible to remove these.

In case they will become obsolete, they will be just NOP-s, so I don't think this is a problem.

How important do we think these really are?

I can't judge if there are customers who are really interested tweaking them, although in our internal discussions we assumed that there will be some. TBH, if someone really wants to tweak this stuff for faster download, they should just bump the initial window size. The question is if there's anyone interested overriding them for lower memory footprint.

Copy link
Contributor

@geoffkizer geoffkizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting the feedback I have so far, more to come later.

// First check for the AppContext switch, giving it priority over the environment variable.
if (AppContext.TryGetSwitch(Http3DraftSupportAppCtxSettingName, out bool allowHttp3))
// Disallow negative values:
if (value < 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 is valid?

Copy link
Member Author

@antonfirsov antonfirsov Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no practical functional difference between 0 and double.Epsilon, these are mathematically valid values which will practically disable the scaling. Should we bother setting an arbitrary minimum?

(The question only matters if we want to keep the knob.)

internal bool _disableDynamicHttp2WindowSizing = DisableDynamicHttp2WindowSizing;
internal int _maxHttp2StreamWindowSize = MaxHttp2StreamWindowSize;
internal double _http2StreamWindowScaleThresholdMultiplier = Http2StreamWindowScaleThresholdMultiplier;
internal int _initialHttp2StreamWindowSize = Http2Connection.DefaultInitialWindowSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't Http2Connection.DefaultInitialWindowSize be in HttpHandlerDefaults then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should, yeah.

I also wonder whether 64K is still the right value for this. Should we make this larger?

Copy link
Member Author

@antonfirsov antonfirsov Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it wasn't optimal in any of my runs. Data from 0ms RTT / 5Gbps (loopback communication) benchmarks suggests it is worth to set it to 256K, however the window will scale faster to larger than optimal maximum sizes in all cases then.

I'm a bit hesitant to change the defaults without having more data. Maybe we should invest a couple of days to find the right defaults for the August preview?


private int _deliveredBytes;
private int _streamWindowSize;
private long _lastWindowUpdate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could potentially use ValueStopwatch here. It remembers _startTimestamp and has public TimeSpan GetElapsedTime(). I haven't used it extensively but @MihaZupan did for telemetry/counters.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I stole the the unit conversion logic from ValueStopWatch because it seemed to be too heavy 😆 We don't need _startTimestamp here, we only need differences compared to the last timestamp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the _startTimestamp would be your _lastWindowUpdate...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit hesitant to change & rebenchmark at this point, may try on Monday.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that:

TimeSpan dt = _stopwatch.GetElapsedTime();
_stopwatch = ValueStopwatch.StartNew();

will call Stopwatch.GetTimestamp() (OS call) twice. Although this wouldn't be a noticeable perf issue, I was actually afraid that someone will complain about this in the code review out of perfectionism, so went with manual timestamping instead of ValueStopwatch 😄 Don't want to change this now, want to fix my test and the remaining nits, and get this merged ASAP.

@antonfirsov

This comment has been minimized.

@azure-pipelines

This comment has been minimized.


private int _deliveredBytes;
private int _streamWindowSize;
private long _lastWindowUpdate;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the _startTimestamp would be your _lastWindowUpdate...

@antonfirsov
Copy link
Member Author

One of my new tests failed on the CI because of timing issues. I will try to harden it on Monday, if it's not possible I will delete it.

Unfortunately there is no way to test this feature without extensive use of Task.Delay.

@antonfirsov

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@antonfirsov

This comment has been minimized.

@azure-pipelines

This comment has been minimized.

@antonfirsov
Copy link
Member Author

All CI failures are unrelated. I had to remove the test LowBandwidthDelayProduct_ClientStreamReceiveWindowStopsScaling, looks like it's not possible to reliably test this particular characteristic.

@geoffkizer @ManickaP if there's nothing critical left, I'll merge this within ~36 hours.

Copy link
Member

@ManickaP ManickaP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all the research and benchmarking to do this 🚀

@antonfirsov antonfirsov merged commit c7ffa32 into dotnet:main Jul 8, 2021
@JamesNK
Copy link
Member

JamesNK commented Jul 8, 2021

Nice!

Is there a before and after benchmark of this? I'd like to talk about it as a performance improvement when using gRPC and it would be good to have numbers to back that up.

e.g. before it took X seconds to download 100 megabytes, now it takes Y seconds.

@antonfirsov
Copy link
Member Author

antonfirsov commented Jul 8, 2021

@JamesNK it's really a function of network delay and bandwidth. If the peers communicate over localhost, the difference is negligible. If there is a delay of several 10s of milliseconds with a relatively high bandwidth, it can be an order of magnitude or even more. I will share my spreadsheet with you tomorrow. Might make sense to make a GH-friendly report out of it, but I just started my vacation, may do it when I'm back :)

@JamesNK
Copy link
Member

JamesNK commented Jul 8, 2021

I get that it is variable. One or two examples to give people an idea of the kind of performance improvement it can get is what I'm looking for. Devs love this kind of stuff.

No rush.

@karelz karelz added this to the 6.0.0 milestone Jul 15, 2021
antonfirsov added a commit that referenced this pull request Aug 6, 2021
Completely redesign tests for HTTP/2 KeepAlive PING, so they:
- Work well with RTT pings introduced in Implement dynamic HTTP/2 window scaling #54755
- Run sequentially, reducing the chance of failing because of timing issues caused by parallel workloads
- Are better organized: multiple test cases for different scenarios, instead of one theory with complex branches on parameters

Instead of reading / reacting to frames inline, there is a separate Task for processing incoming frames, responding to PING immediately and pushing other frames to a Channel<Frame>.

Fixes #41929
@ghost ghost locked as resolved and limited conversation to collaborators Aug 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
5 participants