Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved connection / stream results when wai applications throw exceptions. #740

Closed
wants to merge 4 commits into from

Conversation

iand675
Copy link
Contributor

@iand675 iand675 commented Apr 19, 2019

  • Bumped the version number

After submitting your PR:

  • Update the Changelog.md file with a link to your PR
  • Check that CI passes (or if it fails, for reasons unrelated to your change, like CI timeouts)

Overview

Our services operate behind Google load balancer instances, which assume that opened connections stay open for up to 10 minutes. In the event of uncaught exceptions being handled by Warp, the current behavior for HTTP/1 is to close the connection after sending the error response. The HTTP/2 code ignores the user-provided exception response handler from settings and closes the stream with an HTTP/2 stream INTERNAL_ERROR signal.

This caused the issue for us on HTTP/1 connections that subsequent requests would attempt to reuse the closed connection and fail (since the connection was gone). For GET requests, that wasn't a huge deal, since the load balancer would replay them (yay idempotency 🎉). For non-idempotent requests, we'd just lose the data. In HTTP/2, the issue was just the inconvenience of not getting proper 500s back.

Arguably according to the HTTP specs, Warp's current behavior is acceptable, and probably the easiest way to recover from any odd error states, but this PR attempts to make connection reuse a bit more robust so we can keep our architecture the same.

HTTP/1 Implementation

We need to distinguish between several failure modes:

Issue Connection Recoverable?
Errors thrown by an Application prior to returning a response. ✓(we have a response handler from Settings, can generate a well-formed response)
Sockets breaking while sending response ✗ (can't send any more data 🤷‍♂️)
IOExceptions from files referenced in ResponseFile. (for example: what happens if a while gets modified/deleted while it's being streamed as a result?) ✗ (Content-Length headers except the response to be a certain size, so if we don't kill the connection or stream, we'd have to feed bad data as padding)
Application errors while streaming the body via ResponseBuilder or ResponseStream. ✗ (Allowing the response to finish would indicate the request completed as intended, possibly feeding the client malformed data)
Clients sending malformed or overly long requests ✗ (malformed requests are generally irrecoverable, overly long requests require us to consume the full body, so the impasse requires a connection close.)

The general heuristic here is that once we start to flush the response and an exception is thrown any time during the flushing process, consider the connection to be broken. In order to accomplish this, we wrap exceptions thrown while responding with ExceptionInsideResponseBody. If an uncaught exception has this wrapper, then we can close the connection, otherwise we can use the provided exception handler from Settings and keep the connection alive since we're not wedged in an irrecoverable state.

HTTP/2 Implementation

HTTP/2 doesn't reuse streams, so if a stream breaks, new requests on the same connection are fine. However, we use similar logic to differentiate between exceptions thrown in the flushing process with the same ExceptionInsideResponseBody.

Before:

curl https://app.lvh.me:3000/error/wai https://app.lvh.me:3000/error/wai -v          
...
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x558f2a82f900)
> GET /error/wai HTTP/2
> Host: app.lvh.me:3000
> User-Agent: curl/7.58.0
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
* HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
* Connection #0 to host app.lvh.me left intact
curl: (92) HTTP/2 stream 1 was not closed cleanly: INTERNAL_ERROR (err 2)
* Found bundle for host app.lvh.me: 0x558f2a82f6a0 [can multiplex]
* Re-using existing connection! (#0) with host app.lvh.me
* Connected to app.lvh.me (127.0.0.1) port 3000 (#0)
* Using Stream ID: 3 (easy handle 0x558f2a82f900)
> GET /error/wai HTTP/2
> Host: app.lvh.me:3000
> User-Agent: curl/7.58.0
> Accept: */*
> 
* HTTP/2 stream 3 was not closed cleanly: INTERNAL_ERROR (err 2)
* Connection #0 to host app.lvh.me left intact
curl: (92) HTTP/2 stream 3 was not closed cleanly: INTERNAL_ERROR (err 2)

After:

curl https://app.lvh.me:3000/error/wai https://app.lvh.me:3000/error/wai -v
...
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x5591c789c900)
> GET /error/wai HTTP/2
> Host: app.lvh.me:3000
> User-Agent: curl/7.58.0
> Accept: */*
> 
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 500 
< date: Fri, 19 Apr 2019 06:30:31 GMT
< server: Warp/3.2.26
< content-type: text/plain; charset=utf-8
< 
* Connection #0 to host app.lvh.me left intact
Something went wrong* Found bundle for host app.lvh.me: 0x5591c789c6a0 [can multiplex]
* Re-using existing connection! (#0) with host app.lvh.me
* Connected to app.lvh.me (127.0.0.1) port 3000 (#0)
* Using Stream ID: 3 (easy handle 0x5591c789c900)
> GET /error/wai HTTP/2
> Host: app.lvh.me:3000
> User-Agent: curl/7.58.0
> Accept: */*
> 
< HTTP/2 500 
< date: Fri, 19 Apr 2019 06:30:31 GMT
< server: Warp/3.2.26
< content-type: text/plain; charset=utf-8
< 
* Connection #0 to host app.lvh.me left intact
Something went wrong% 

@iand675
Copy link
Contributor Author

iand675 commented Apr 19, 2019

I don't think the Windows test failure is due to this change? If so, would appreciate some help fixing as I don't have a Windows dev environment.

@kazu-yamamoto
Copy link
Contributor

CI test runs again.

Copy link
Contributor

@kazu-yamamoto kazu-yamamoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks terrific!

kazu-yamamoto added a commit to kazu-yamamoto/wai that referenced this pull request Apr 25, 2019
@kazu-yamamoto
Copy link
Contributor

Rebased and merged. Thank you for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants