Skip to content

Commit

Permalink
Summary of changes in this PR:
Browse files Browse the repository at this point in the history
  * Finally achieving my dream of moving the connection pool out of HTTP; it's going to live in the [ConcurrentUtilities.jl](JuliaServices/ConcurrentUtilities.jl#18)
    package instead. In short, it had no unit tests, scant documentation, and was generally a pain to deal with in HTTP. We also discovered at least 2 major issues with the current
    implementation during a deep dive into performance and issue diagnosis, including:
      * If a ConnectError was thrown while creating a connection, a "permit" to the pool was permanently lost; get enough of them and the entire connection pool grinds to a halt :grimace:
      * Being a threadsafe structure, the pool had the poor choice of holding the pool lock for the duration of trying to get a permit _and making new connections_. This meant that in the
        case of a connection taking a long time to be made, we were essentially holding the rest of the pool hostage. This is totally unnecessary and can cause big performance issues in
        really heavy workloads where there's lots of contention on the pool for managing requests.
    The new implementation in ConcurrentUtilities.jl solves both these problems while being about 1/4th the LOC of the previous implementation. And it has unit tests! yay!
    All in all, I think #1033 and #1032 should both be mostly resolved by these changes/updates.
  * Relatedly, we're adjusting the API for connection pools to allow the user to pass in their _own_ connection pool to be used for that request (to check for a connection to reuse and to
    return the connection to after completion). A pool can be constructed like `HTTP.Pool(; max::Int)` and passed to any of the `HTTP.request` methods like `HTTP.get(...; pool=pool)`.
    HTTP has its own global default pool `HTTP.Connections.POOL` that it uses by default to manage connection reuse. The `HTTP.set_default_connection_limit!` will still work as long as it
    is called before any requests are made. Calling it _after_ requests have been made will be a no-op. The `connection_limit` keyword arg is now formally deprecated and will issue a warning
    if passed. I'm comfortable with a full deprecation here because it turns out it wasn't even really working before anyway (unless it was passed/used on _every_ request and never changed).
    So instead of "changing" things, we're really just doing a proper implementation that now actually works, has better behavior, and is actually controllable by the user.
  * Add a try-finally in keepalive! around our global IO lock usage just for good house-keeping
  * Refactored `try_with_timeout` to use a `Channel` instead of the non-threaded `@async`; it's much simpler and seems cleaner
  * I refactored a few of the stream IO functions so that we always know the number of bytes downloaded, whether in memory or written to
    an IO, so we can log them and use them in verbose logging to give bit-rate calculations
  * Added a new `logerrors::Bool=false` keyword arg that allows doing `@error` logs on errors that may otherwise be "swallowed" when doing retries;
    it can be helpful to sometimes be able to at least see what kinds of errors are happening; also cleaned up our error handling in general so we don't lose backtraces which fixes #1003.
  * Added lots of metrics around various time spent in various layers, read vs. write durations, etc. These can be enabled, and stored in the request context, by passing `observelayers=true`
    This mostly resolves #1025 and #1019.
  * Fixed some missing keyword args that either weren't correct in the inline docs or weren't included in the client.md docs
  * Removed a few client-side layers (BasicAuth, DebugRequest, Canonicalize, etc.) since their implementations were _really_ trivial and moved their functionality into a single HeadersRequest
    layer where we now do all the header-related work during the request. This has the affect of _drastically_ reducing the exploding backtraces we now have when doing requests because of
    our "functional style" client-side layering.
  * Fixed #612 by ensuring `keepalive` is included in our `connectionkey` computation so only connections that specified `keepalive` will be reused if its passed the same value on subsequent requests
  • Loading branch information
quinnj committed Apr 20, 2023
1 parent 75c1b25 commit 639469b
Show file tree
Hide file tree
Showing 32 changed files with 429 additions and 644 deletions.
1 change: 1 addition & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ version = "1.7.4"
[deps]
Base64 = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"
CodecZlib = "944b1d66-785c-5afd-91f1-9de20f533193"
ConcurrentUtilities = "f0e56b4a-5159-44fe-b623-3e5288b988bb"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
LoggingExtras = "e6f89c97-d47a-5376-807f-9c37f3926c36"
Expand Down
24 changes: 21 additions & 3 deletions docs/src/client.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,18 +91,26 @@ If you're running into a real head-scratcher, don't hesitate to [open an issue](

When a connection is attempted to a remote host, sometimes the connection is unable to be established for whatever reason. Passing a non-zero `connect_timetout` value will cause `HTTP.request` to wait that many seconds before giving up and throwing an error.

#### `connection_limit`
#### `pool`

Many remote web services/APIs have rate limits or throttling in place to avoid bad actors from abusing their service. They may prevent too many requests over a time period or they may prevent too many connections being simultaneously open from the same client. By default, when `HTTP.request` opens a remote connection, it remembers the exact host:port combination and will keep the connection open to be reused by subsequent requests to the same host:port. The `connection_limit` keyword argument controls how many concurrent connections are allowed to a single remote host.
Many remote web services/APIs have rate limits or throttling in place to avoid bad actors from abusing their service. They may prevent too many requests over a time period or they may prevent too many connections being simultaneously open from the same client. By default, when `HTTP.request` opens a remote connection, it remembers the exact host:port combination and will keep the connection open to be reused by subsequent requests to the same host:port. The `pool` keyword argument specifies a specific `HTTP.Pool` object to be used for controlling the maximum number of concurrent connections allowed to be happening across the pool. It's constructed via `HTTP.Pool(; max::Int)`. Requests attempted when the maximum is already hit will block until previous requests finish. The `idle_timeout` keyword argument can be passed to `HTTP.request` to control how long it's been since a connection was lasted used in order to be considered 'valid'; otherwise, "stale" connections will be discarded.

#### `readtimeout`

After a connection is established and a request is sent, a response is expected. If a non-zero value is passed to the `readtimeout` keyword argument, `HTTP.request` will wait to receive a response that many seconds before throwing an error. Note that for chunked or streaming responses, each chunk/packet of bytes received causes the timeout to reset. Passing `readtimeout = 0` disables any timeout checking and is the default.
After a connection is established and a request is sent, a response is expected. If a non-zero value is passed to the `readtimeout` keyword argument, `HTTP.request` will wait to receive a response that many seconds before throwing an error. Passing `readtimeout = 0` disables any timeout checking and is the default.

### `status_exception`

When a non-2XX HTTP status code is received in a response, this is meant to convey some error condition. 3XX responses typically deal with "redirects" where the request should actually try a different url (these are followed automatically by default in `HTTP.request`, though up to a limit; see [`redirect`](@ref)). 4XX status codes typically mean the remote server thinks something is wrong in how the request is made. 5XX typically mean something went wrong on the server-side when responding. By default, as mentioned previously, `HTTP.request` will attempt to follow redirect responses, and retry "retryable" requests (where the status code and original request method allow). If, after redirects/retries, a response still has a non-2XX response code, the default behavior is to throw an `HTTP.StatusError` exception to signal that the request didn't succeed. This behavior can be disabled by passing `status_exception=false`, where the `HTTP.Response` object will be returned with the non-2XX status code intact.

### `logerrors`

If `true`, `HTTP.StatusError`, `HTTP.TimeoutError`, `HTTP.IOError`, and `HTTP.ConnectError` will be logged via `@error` as they happen, regardless of whether the request is then retried or not. Useful for debugging or monitoring requests where there's worry of certain errors happening but ignored because of retries.

### `observelayers`

If `true`, enables the `HTTP.observe_layer` to wrap each client-side "layer" to track the amount of time spent in each layer as a request is processed. This can be useful for debugging performance issues. Note that when retries or redirects happen, the time spent in each layer is cumulative, as noted by the `[layer]_count`. The metrics are stored in the `Request.context` dictionary, and can be accessed like `HTTP.get(...).request.context`.

### `basicauth`

By default, if "user info" is detected in the request url, like `http://user:password@host`, the `Authorization: Basic` header will be added to the request headers before the request is sent. While not very common, some APIs use this form of authentication to verify requests. This automatic adding of the header can be disabled by passing `basicauth=false`.
Expand Down Expand Up @@ -150,6 +158,16 @@ Controls the total number of retries that will be attempted. Can also disable al

By default, this keyword argument is `false`, which controls whether non-idempotent requests will be retried (POST or PATCH requests).

#### `retry_delays`

Allows providing a custom `ExponentialBackOff` object to control the delay between retries.
Default is `ExponentialBackOff(n = retries)`.

#### `retry_check`

Allows providing a custom function to control whether a retry should be attempted.
The function should accept 5 arguments: the delay state, exception, request, response (an `HTTP.Response` object *if* a request was successfully made, otherwise `nothing`), and `resp_body` response body (which may be `nothing` if there is no response yet, otherwise a `Vector{UInt8}`), and return `true` if a retry should be attempted. So in traditional nomenclature, the function would have the form `f(s, ex, req, resp, resp_body) -> Bool`.

### Redirect Arguments

#### `redirect`
Expand Down
8 changes: 2 additions & 6 deletions docs/src/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ HTTP.Messages.isidempotent
HTTP.Messages.retryable
HTTP.Messages.defaultheader!
HTTP.Messages.readheaders
HTTP.DefaultHeadersRequest.setuseragent!
HTTP.HeadersRequest.setuseragent!
HTTP.Messages.readchunksize
HTTP.Messages.headerscomplete(::HTTP.Messages.Response)
HTTP.Messages.writestartline
Expand Down Expand Up @@ -167,17 +167,13 @@ HTTP.poplayer!
HTTP.popfirstlayer!
HTTP.MessageRequest.messagelayer
HTTP.RedirectRequest.redirectlayer
HTTP.DefaultHeadersRequest.defaultheaderslayer
HTTP.BasicAuthRequest.basicauthlayer
HTTP.HeadersRequest.headerslayer
HTTP.CookieRequest.cookielayer
HTTP.CanonicalizeRequest.canonicalizelayer
HTTP.TimeoutRequest.timeoutlayer
HTTP.ExceptionRequest.exceptionlayer
HTTP.RetryRequest.retrylayer
HTTP.ConnectionRequest.connectionlayer
HTTP.DebugRequest.debuglayer
HTTP.StreamRequest.streamlayer
HTTP.ContentTypeDetection.contenttypedetectionlayer
```

### Raw Request Connection
Expand Down
Loading

0 comments on commit 639469b

Please sign in to comment.