Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

websocket: tunneling websockets (and upgrades in general) over H2 #4188

Merged
merged 6 commits into from
Aug 28, 2018

Conversation

alyssawilk
Copy link
Contributor

This allows tunneling over H2, unfortunately only enabled via nghttp2_option_set_no_http_messaging until nghttp2/nghttp2#1181 is sorted out. See the big warnings about not using (at least without knowing you're going to have a roll-out that may break backwards-compatibility some time in the not too distant future)

Risk Level: Medium (changes are contained behind H2-with-Upgrade header which doesn't work today)
Testing: unit tests, and turned up the full H1/H2 upstream/downstream in the integration test
Docs Changes: for now, though I may take them out. I think they're useful for review.
Release Notes: not added since we don't want folks using it (outside of testbeds) yet.
#1630

Signed-off-by: Alyssa Wilk alyssar@chromium.org

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

@ggreenway @PiotrSikora @mattklein123 @rshriram may be interested in taking a look?

upgrade/downgrade mechanism will drop the original method and transform the Upgrade request to
a GET method on the final Envoy-Upstream hop.

TODO(alyssawilk) link to the config changes required to enable upgrade upstream/downstream
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implicit TODO - fix the broken docs build :-)

(just wanted to get this out tonight, since I'm out tomorrow and I'd like a round of @mattklein123 thoughts)

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look at the tests but at a high level this makes sense to me. Left some comments. Will defer to @ggreenway @PiotrSikora @rshriram for further review.


void Utility::transformUpgradeResponseFromH2toH1(HeaderMap& headers, absl::string_view upgrade) {
if (getResponseStatus(headers) == 200) {
headers.insertStatus().value().setCopy("101", 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use setInteger() here. Same above.

headers.removeUpgrade();
headers.removeConnection();
if (headers.ContentLength() == nullptr) {
headers.insertTransferEncoding().value().setReference(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked at the spec but is this right? For an H2 conversion I think we don't want any transfer encoding? Same above? More comments if it's correct?

HeaderMapImpl modified_headers(headers);
upgrade_type_ = headers.Upgrade()->value().c_str();
if (headers.Status()) {
Http::Utility::transformUpgradeResponseFromH1toH2(modified_headers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My high level comment above the encode/decode changes in this file is that it seems to assume that above the codec level, all upgrades are treated as H1 style upgrades. Basically, this is an invariant that we assume is true for both upstream and downstream proxy. Is that correct? If so, I think it makes sense to do the conversions in the codec, but perhaps more comments on what is going on?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be transformUpgradeResponseFromH2toH1?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore..

if (Http::Utility::isUpgrade(headers)) {
HeaderMapImpl modified_headers(headers);
upgrade_type_ = headers.Upgrade()->value().c_str();
if (headers.Status()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it feels a bit odd that we are handling both request/response at the base stream layer and I wonder if we should have request/response stream overrides to make this code more clear? I don't feel very strongly about this but throwing it out there. Same comment in the decode path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the header munging out makes for a lot of duplicate code which I'd prefer to avoid. I could either move the complexity (status check) to the utility or have a Server/Client wrapper for transforms. I'll start with the latter and see what folks think of it.

Handling H2 hops (implementation in progress)
---------------------------------------------

One oft requested feature for Envoy was to allow WebSocket to traverse HTTP/2 hops, where there
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:). I would remove the origin of this feature and simply state it as a fact like "Envoy currently has an alpha implementation of tunneling websockets over H2 streams for deployments that prefer a uniform H2 mesh throughout."

One oft requested feature for Envoy was to allow WebSocket to traverse HTTP/2 hops, where there
was a set-up such as

Client ---- HTTP/1.1 ---- Frontline Envoy ---- HTTP/2 ---- Second tier Envoy ---- H1 ---- Upstream
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example, IMO, would be a lot clearer (at the risk of losing generality) if it was written as

[Client] ---- HTTP/1.1 ---- [Front Envoy] ---- HTTP/2 ---- [Sidecar Envoy ---- H1  ---- App]

@@ -241,6 +242,8 @@ struct Http2Settings {
// our default connection-level window also equals to our stream-level
static const uint32_t DEFAULT_INITIAL_CONNECTION_WINDOW_SIZE = 256 * 1024 * 1024;
static const uint32_t MAX_INITIAL_CONNECTION_WINDOW_SIZE = (1U << 31) - 1;
// By default both nghttp2 and Envoy do now allow CONNECT over H2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/now/not/

buildHeaders(final_headers, modified_headers);
} else {
buildHeaders(final_headers, headers);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add some indication on the direction? whether the response is being returned to downstream or request is being forwarded to upstream ? that should make this code a bit more clear.

@@ -151,6 +163,15 @@ void ConnectionImpl::StreamImpl::pendingRecvBufferLowWatermark() {
readDisable(false);
}

void ConnectionImpl::StreamImpl::decodeHeaders() {
if (Http::Utility::isH2UpgradeRequest(*headers_)) {
Http::Utility::transformUpgradeRequestFromH2toH1(*headers_);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. I can't figure out if this is between downstream->Envoy or Envoy->downstream, or Envoy->upstream or upstream->Envoy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or some comments would surely help

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic going to preclude us from stuffing arbitrary TCP streams into H2?
It would be easier to qualify this as
if isH2ExtendedConnect()
if connectProtocolType==WebSocket then transformUpgradeintoWebsocketOverH1
else // leaving the room open for other protos that we discussed earlier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think the refactor I'm doing for Matt (not yet uploaded) will help since the ClientStreamImpl will transformUpgradeRequestFromH1toH2 and transformUpgradeResponseFromH2toH1 and the Server will do the same in reverse.

If this is an extended connect we'll munge the headers back to websocket form, else we won't touch them, and the CONNECT will remain a CONNECT. I don't think we currently have plans for doing arbitrary TCP for H2 (we need to but we don't yet have support) so I don't know what we would do in the else today :-)

ConnectionManagerConfig& config, const Router::Config& route_config,
Runtime::RandomGenerator& random, Runtime::Loader& runtime,
const LocalInfo::LocalInfo& local_info) {
// If this is a Upgrade request, do not remove the Connection and Upgrade headers,
// as we forward them verbatim to the upstream hosts.
if (protocol == Protocol::Http11 && Utility::isUpgrade(request_headers)) {
if (Utility::isUpgrade(request_headers)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove the check? the extended connect protocol does not use any upgrade headers from what I see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because as docced up, for filter chain consistency, we consistently transform upgrades to H1 style headers at the codec layer. If there's
client - H1 upgrade - Envoy - H2 hop - another Envoy - H2 hop - something else
the Envoy sending and receiving the upgrade in H2 form will still see HTTP/1.1 style upgrade headers at the http connection manager and in the HTTP filters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this check was excluding Http2 and Http10. If I'm understanding this correctly, you want to allow Http2 here. Do we need to still exclude Http10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need to include HTTP/2 here given the way we're doing upgrades in Envoy. Upgrade is an HTTP/1.1 header so arguably we could protocol checks back in and explicitly disallow something we don't expect folks to do. Given that 1.0 support is off by default and I don't think anyone's trying to do 1.0 upgrades I lean towards not worrying about it but I'm happy to add it back if you'd prefer!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'm fine with it as-is.

bool buffers_overrun() const { return read_disable_count_ > 0; }

ConnectionImpl& parent_;
HeaderMapImplPtr headers_;
StreamDecoder* decoder_{};
std::string upgrade_type_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of qualifying this with websocket alone? Or may be an enum for other protos in future? The reason is that Websocket is the only Http-ish proto in this picture. Others are going to be just opaque TCP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the advantage to restricting it? I guess we'd save the memory of latching one header but I don't think that's worth removing the flexibility.


void Utility::transformUpgradeRequestFromH2toH1(HeaderMap& headers) {
ASSERT(Utility::isH2UpgradeRequest(headers));

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this definitely needs to be qualified to websocket alone. Coz for other protos, we are likely to initiate outbound TCP proxy connections, where we simply strip the H2 framing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say initiate outbound TCP proxy connections, what do you mean? you can't strip H2 framing over an H2 connection. I can definitely think of other non-websocket upgrades we're going to do over H2 extended connect, and I'd prefer a consistent mechanism for handling them, so I'd prefer to not limit this to websocket.

Keep in mind, these are configurable on a per-upgrade basis. We can always add a different type of upgrade, or configure an upgrade for different behavior, to extend Envoy functionality later without changing this code at all. Basically since the mechanism for this type of support and any new type of support are all done via config, I don't think having flexibility in this implementation will come back to bite us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be I am missing something here. I am envisioning being able to tunnel mySQL connections over H2 streams to the envoy on the other end, and having the server Envoy hand over a plain tcp connection to the mysql container in the same pod.

The client-side envoy would receive the inbound connection over a TCP_PROXY, and and shove the bytes into a H2 stream and send to an upstream Envoy.

Now, if there are multiple envoys between the client and server side envoys, then these intermediaries are going to be receiving a H2 stream with extended CONNECT header, and have to forward the same H2 stream to the next envoy.

Reading this code, I got the impression that with websockets, an intermediary envoy would strip the H2 framing and convert the extended connect/upgrade into standard H1 upgrade for websocket. Then when handing over the same connection to the cluster manager, the codec will re-wrap the H1 upgrade headers back into a H2 stream and forward. And my concern here (and elsewhere) was this H2-->H1 unwrapping where there is an implicit assumption that these protocols are all somehow h1 friendly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we're tunneling upgrades, the payload has to be HTTP2/HTTP1 friendly, today.

In the long run, I think we're going to have code which both encaps TCP in an HTTP/H2 upgrade, and decaps HTTP/H2 upgrades back to TCP. We don't have that code yet. I think we're absolutely going to have it, but the point of this PR is to allow Envoy to pass along encaped data, not to do the decap

Our main use case for this is multiplexing transparent proxying. If we have a frontline envoy without certs, it can do TCP termination, forward the payload over a pre-warmed H2 hop, demux it at the far end, and then treat each H2 stream as if it were a new incoming TCP connection (which will often have TLS/H2 payload, requiring another layer of demuxing). Some second line Envoys might be configured to "foward raw_tcp upgrades untouched" and some might be configured to "demux and handle raw_tcp upgrades". I do think there's an advantage to allowing multiple types of upgrade (assuming the eventual nghttp2 implementation allows) so the config can do one thing for mysql_upgrades and a different thing for raw_tcp upgrades. It would allow the frontline Envoy which is doing the encap in HTTP1/HTTP2 to specify the upgrade type, and then the next Envoy in the chain could be configured to pass it along or terminate. But all this PR does is allow the upgrades to survive the H2 hop.

if (getResponseStatus(headers) == 200) {
headers.insertStatus().value().setCopy("101", 3);
}
// TODO(alyssawilk) what should we do for websocket responses on the failure path?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not handling websocket upgrade rejects at all. The assumption is that the server would terminate the connection allowing us to cleanup. We could look at the response code and clean up

Copy link
Member

@rshriram rshriram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super nice..I didn't expect the changes to be this small.

Thinking out aloud, vis-a-vis our earlier conversation on tunneling arbitrary protos over H2, I am wondering if it makes sense for us to treat the Websocket over H1 as any other opaque TCP proto (e.g., mysql, mongo, etc.) and shove the bytes (including HTTP1 upgrade headers) into a H2 stream, strip out the H2 framing at the other end before sending to the backend server. This has the advantage that the connect support in Envoy is very generic [i.e. we get rid of all the transform business] but has the disadvantage that we have to magically transform the incoming H1 headers and such into an opaque TCP payload, before entering the tunneling logic.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk
Copy link
Contributor Author

@rshriram any further requests? @ggreenway any comments?

@ggreenway
Copy link
Contributor

Sorry, haven't had time to look over this yet. I'll try to get to it today.

@ggreenway
Copy link
Contributor

Apologies for the delay. I'm reviewing this morning.

Copy link
Contributor

@ggreenway ggreenway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall.

// with simply enabling CONNECT for H2. This may require some tweaks to the
// headers making pre-CONNECT-support proxying not backwards compatible with
// post-CONNECT-support proxying.
google.protobuf.BoolValue allow_connect = 5;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be bool instead of g.p.BoolValue?

ConnectionManagerConfig& config, const Router::Config& route_config,
Runtime::RandomGenerator& random, Runtime::Loader& runtime,
const LocalInfo::LocalInfo& local_info) {
// If this is a Upgrade request, do not remove the Connection and Upgrade headers,
// as we forward them verbatim to the upstream hosts.
if (protocol == Protocol::Http11 && Utility::isUpgrade(request_headers)) {
if (Utility::isUpgrade(request_headers)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this check was excluding Http2 and Http10. If I'm understanding this correctly, you want to allow Http2 here. Do we need to still exclude Http10?

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
ggreenway
ggreenway previously approved these changes Aug 27, 2018
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@alyssawilk alyssawilk merged commit cd171d9 into envoyproxy:master Aug 28, 2018
rshriram pushed a commit to rshriram/envoy that referenced this pull request Oct 30, 2018
Pulling the following changes from github.com/envoyproxy/envoy:

f936fc6 ssl: serialize accesses to SSL socket factory contexts (envoyproxy#4345)
e34dcd6 Fix crash in tcp_proxy (envoyproxy#4323)
ae6a252 router: fix matching when all domains have wildcards (envoyproxy#4326)
aa06142 test: Stop fake_upstream methods from accidentally succeeding (envoyproxy#4232)
5d73187 rbac: update the authenticated.user to a StringMatcher. (envoyproxy#4250)
c6bfc7d time: Event::TimeSystem abstraction to make it feasible to inject time with simulated timers (envoyproxy#4257)
752483e Fixing the fix (envoyproxy#4333)
83487f6 tls: update BoringSSL to ab36a84b (3497). (envoyproxy#4338)
7bc210e test: fixing interactions between waitFor and ignore_spurious_events (envoyproxy#4309)
69474b3 admin: order stats in clusters json admin (envoyproxy#4306)
2d155f9 ppc64le build (envoyproxy#4183)
07efc6d fix static initialization fiasco problem (envoyproxy#4314)
0b7e3b5 test: Remove declared but undefined class methods (envoyproxy#4297)
1485a13 lua: make sure resetting dynamic metadata wrapper when request info is marked dead
d243cd6 test: set to zero when start_time exceeds limit (envoyproxy#4328)
0a1e92a test: fix heap use-after-free in ~IntegrationTestServer. (envoyproxy#4319)
cddc732 CONTRIBUTING: Document 'kick-ci' trick. (envoyproxy#4335)
f13ef24 docs: remove reference to deprecated value field (envoyproxy#4322)
e947a27 router: minor doc fixes in stream idle timeout (envoyproxy#4329)
0c2e998 tcp-proxy: fixing a TCP proxy bug where we attempted to readDisable a closed connection (envoyproxy#4296)
00ffe44 utility: fix strftime overflow handling. (envoyproxy#4321)
af1183c Re-enable TcpProxySslIntegrationTest and make the tests pass again. (envoyproxy#4318)
3553461 fuzz: fix H2 codec fuzzer post envoyproxy#4262. (envoyproxy#4311)
42f6048 Proto string issue fix (envoyproxy#4320)
9c492a0 Support Envoy to fetch secrets using SDS service. (envoyproxy#4256)
a857219 ratelimit: revert `revert rate limit failure mode config` and add tests (envoyproxy#4303)
1d34172 dns: fix exception unsafe behavior in c-ares callbacks. (envoyproxy#4307)
1212423 alts: add gRPC TSI socket (envoyproxy#4153)
f0363ae fuzz: detect client-side resets in H2 codec fuzzer. (envoyproxy#4300)
01aa3f8 test: hopefully deflaking echo integration test (envoyproxy#4304)
1fc0f4b ratelimit: link legacy proto when message is being used (envoyproxy#4308)
aa4481e fix rare List::remove(&target) segfault (envoyproxy#4244)
89e0f23 headers: fixing fast fail of size-validation (envoyproxy#4269)
97eba59 build: bump googletest version. (envoyproxy#4293)
0057e22 fuzz: avoid false positives in HCM fuzzer. (envoyproxy#4262)
9d094e5 Revert ac0bd74 (envoyproxy#4295)
ddb28a4 Add validation context provider (envoyproxy#4264)
3b47cba added histogram latency information to Hystrix dashboard stream (envoyproxy#3986)
cf87d50 docs: update SNI FAQ. (envoyproxy#4285)
f952033 config: fix update empty stat for eds (envoyproxy#4276)
329e591 router: Add ability of custom headers to rely on per-request data (envoyproxy#4219)
68d20b4  thrift: refactor build files and imports (envoyproxy#4271)
5fa8192 access_log: log requested_server_name in tcp proxy (envoyproxy#4144)
fa45bb4 fuzz: libc++ clocks don't like nanos. (envoyproxy#4282)
53f8944 stats: add symbol table for future stat name encoding (envoyproxy#3927)
c987b42 test infra: Remove timeSource() from the ClusterManager api (envoyproxy#4247)
cd171d9 websocket: tunneling websockets (and upgrades in general) over H2 (envoyproxy#4188)
b9dc5d9 router: disallow :path/host rewriting in request_headers_to_add. (envoyproxy#4220)
0c91011 network: skip socket options and source address for UDS client connections (envoyproxy#4252)
da1857d build: fixing a downstream compile error by noting explicit fallthrough (envoyproxy#4265)
9857cfe fuzz: cleanup per-test environment after each fuzz case. (envoyproxy#4253)
52beb06 test: Wrap proto string in std::string before comparison (envoyproxy#4238)
f5e219e extensions/thrift_proxy: Add header matching to thrift router (envoyproxy#4239)
c9ce5d2 fuzz: track read_disable_count bidirectionally in codec_impl_fuzz_test. (envoyproxy#4260)
35103b3 fuzz: use nanoseconds for SystemTime in RequestInfo. (envoyproxy#4255)
ba6ba98 fuzz: make runtime root hermetic in server_fuzz_test. (envoyproxy#4258)
b0a9014 time: Add 'format' test to ensure no one directly instantiates Prod*Time from source. (envoyproxy#4248)
8567460 access_log: support beginning of epoch in START_TIME. (envoyproxy#4254)
28d5f41 proto: unify envoy_proto_library/api_proto_library. (envoyproxy#4233)
f7d3cb6 http: fix allocation bug introduced in envoyproxy#4211. (envoyproxy#4245)

Fixes istio/istio#8310 (once pulled into istio/istio).

Signed-off-by: Piotr Sikora <piotrsikora@google.com>
@alyssawilk alyssawilk deleted the h2_websocket branch November 28, 2018 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants