Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: add new ratelimited retry backoff strategy #12202

Merged
merged 37 commits into from
Aug 14, 2020
Merged
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
0f0916b
router: add new ratelimited retry backoff strategy
numerodix Jul 20, 2020
c188936
Update docs to mention the default value.
numerodix Jul 21, 2020
dcc4e14
Make the spell checker happy
numerodix Jul 21, 2020
6908897
Update version history
numerodix Jul 22, 2020
5cabd3c
Make sure the new headers are consumed
numerodix Jul 22, 2020
902f17c
Remove unnecessary import
numerodix Jul 22, 2020
40e7b0a
Add missing break statement
numerodix Jul 22, 2020
c926c9c
Document that first matching header is used
numerodix Jul 22, 2020
8946044
Make sure there is at least one header
numerodix Jul 22, 2020
0f5eb62
Update test to agree with protobuf
numerodix Jul 22, 2020
c965008
Update version history
numerodix Jul 22, 2020
d2ae1a0
Clang tidy prefers I use empty()
numerodix Jul 23, 2020
96e5364
Move default to RetryPolicy
numerodix Jul 23, 2020
97fcc0b
Remove new stats from virtual cluster
numerodix Jul 29, 2020
52b39b7
Improve the docs
numerodix Jul 31, 2020
cc6d146
Remove header x-envoy-ratelimited-reset-max-interval-ms
numerodix Jul 31, 2020
e4dd05c
Fix api CI build
numerodix Aug 1, 2020
2edaf26
Fix merge after rebase
numerodix Aug 1, 2020
fad4cee
Implement new ResetHeaderParser
numerodix Aug 1, 2020
fbd7ae1
Update stats integration test
numerodix Aug 1, 2020
2fc0bec
Remove unnecessary virtual keyword
numerodix Aug 1, 2020
d5d314d
Fix stats integration test (mac)
numerodix Aug 1, 2020
b285474
Move ResetHeaderParser interface into router.h
numerodix Aug 5, 2020
775aa7e
Remove reference to redis which was removed on master
numerodix Aug 5, 2020
37d667f
Fix bad merge
numerodix Aug 10, 2020
6b82264
PR feedback
numerodix Aug 10, 2020
59b06d4
Update stats integration test again after rebase
numerodix Aug 11, 2020
1bef83e
Roll back bad partial merge to stats integration test
numerodix Aug 11, 2020
1fd484a
Merge branch 'master' into ratelimited-backoff-strategy
numerodix Aug 11, 2020
4abc240
Fix merge with master
numerodix Aug 11, 2020
b9c33f1
Update stats integration test again
numerodix Aug 11, 2020
77e0ff9
Fix stats integration test on mac again
numerodix Aug 11, 2020
6180297
Merge branch 'master' into ratelimited-backoff-strategy
numerodix Aug 12, 2020
d7d798c
Fix bad merge in release notes
numerodix Aug 12, 2020
78b319e
Kick CI
numerodix Aug 13, 2020
f7fc6b4
Kick CI
numerodix Aug 13, 2020
8521e02
Fix line that got lost in bad merge
numerodix Aug 13, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 79 additions & 2 deletions api/envoy/config/route/v3/route_components.proto
Original file line number Diff line number Diff line change
Expand Up @@ -1050,10 +1050,15 @@ message RouteAction {
}

// HTTP retry :ref:`architecture overview <arch_overview_http_routing_retry>`.
// [#next-free-field: 11]
// [#next-free-field: 12]
message RetryPolicy {
option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.route.RetryPolicy";

enum ResetHeaderFormat {
SECONDS = 0;
UNIX_TIMESTAMP = 1;
}

message RetryPriority {
option (udpa.annotations.versioning).previous_message_type =
"envoy.api.v2.route.RetryPolicy.RetryPriority";
Expand Down Expand Up @@ -1104,6 +1109,69 @@ message RetryPolicy {
google.protobuf.Duration max_interval = 2 [(validate.rules).duration = {gt {}}];
}

message ResetHeader {
string name = 1
[(validate.rules).string = {min_bytes: 1 well_known_regex: HTTP_HEADER_NAME strict: false}];

ResetHeaderFormat format = 2 [(validate.rules).enum = {defined_only: true}];
}

// A retry back-off strategy that applies when the upstream server rate limits
// the request.
//
// Given this configuration:
//
// .. code-block:: yaml
//
// rate_limited_retry_back_off:
// reset_headers:
// - name: Retry-After
// format: SECONDS
// - name: X-RateLimit-Reset
// format: UNIX_TIMESTAMP
// max_interval: "300s"
//
// The following algorithm will apply:
//
// 1. If the response contains the header ``Retry-After`` its value must be on
// the form ``120`` (an integer that represents the number of seconds to
// wait before retrying). If so, this value is used as the back-off interval.
// 2. Otherwise, if the response contains the header ``X-RateLimit-Reset`` its
// value must be on the form ``1595320702`` (an integer that represents the
// point in time at which to retry, as a Unix timestamp in seconds). If so,
// the current time is subtracted from this value and the result is used as
// the back-off interval.
// 3. Otherwise, Envoy will use the default
// :ref:`exponential back-off <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_back_off>`
// strategy.
//
// No matter which format is used, if the resulting back-off interval exceeds
// ``max_interval`` it is discarded and the next header in ``reset_headers``
// is tried. If a request timeout is configured for the route it will further
// limit how long the request will be allowed to run.
//
// To prevent many clients retrying at the same point in time jitter is added
// to the back-off interval, so the resulting interval is decided by taking:
// ``random(interval, interval * 1.5)``.
//
// .. attention::
//
// Configuring ``rate_limited_retry_back_off`` will not by itself cause a request
// to be retried. You will still need to configure the right retry policy to match
// the responses from the upstream server.
message RateLimitedRetryBackOff {
// Specifies the reset headers (like ``Retry-After`` or ``X-RateLimit-Reset``)
// to match against the response. Headers are tried in order, and matched case
// insensitive. The first header to be parsed successfully is used. If no headers
// match the default exponential back-off is used instead.
repeated ResetHeader reset_headers = 1 [(validate.rules).repeated = {min_items: 1}];

// Specifies the maximum back off interval that Envoy will allow. If a reset
// header contains an interval longer than this then it will be discarded and
// the next header will be tried. Defaults to 300 seconds.
google.protobuf.Duration max_interval = 2 [(validate.rules).duration = {gt {}}];
}

// Specifies the conditions under which retry takes place. These are the same
// conditions documented for :ref:`config_http_filters_router_x-envoy-retry-on` and
// :ref:`config_http_filters_router_x-envoy-retry-grpc-on`.
Expand Down Expand Up @@ -1147,13 +1215,22 @@ message RetryPolicy {
// HTTP status codes that should trigger a retry in addition to those specified by retry_on.
repeated uint32 retriable_status_codes = 7;

// Specifies parameters that control retry back off. This parameter is optional, in which case the
// Specifies parameters that control exponential retry back off. This parameter is optional, in which case the
// default base interval is 25 milliseconds or, if set, the current value of the
// `upstream.base_retry_backoff_ms` runtime parameter. The default maximum interval is 10 times
// the base interval. The documentation for :ref:`config_http_filters_router_x-envoy-max-retries`
// describes Envoy's back-off algorithm.
RetryBackOff retry_back_off = 8;

// Specifies parameters that control a retry back-off strategy that is used
// when the request is rate limited by the upstream server. The server may
// return a response header like ``Retry-After`` or ``X-RateLimit-Reset`` to
// provide feedback to the client on how long to wait before retrying. If
// configured, this back-off strategy will be used instead of the
// default exponential back off strategy (configured using `retry_back_off`)
// whenever a response includes the matching headers.
RateLimitedRetryBackOff rate_limited_retry_back_off = 11;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about folding this in to the RetryBackoff message? It seems logically related since it will govern what the final backoff time is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely possible. I figured it made more sense to make it clear that it's a separate code path with its own parameters and the new stat upstream_rq_retry_backoff_ratelimited to reflect whether it's being used or not.

In the doc update I pushed the mechanism does take a fair bit of explaining, so I think that might be another reason to keep them separated.


// HTTP response headers that trigger a retry if present in the response. A retry will be
// triggered if any of the header matches match the upstream response headers.
// The field is only consulted if 'retriable-headers' retry policy is active.
Expand Down
87 changes: 85 additions & 2 deletions api/envoy/config/route/v4alpha/route_components.proto

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion docs/root/configuration/http/http_filters/router_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ A few notes on how Envoy does retries:
retries. Thus if the request timeout is set to 3s, and the first request attempt takes 2.7s, the
retry (including back-off) has .3s to complete. This is by design to avoid an exponential
retry/timeout explosion.
* Envoy uses a fully jittered exponential back-off algorithm for retries with a default base
* By default, Envoy uses a fully jittered exponential back-off algorithm for retries with a default base
interval of 25ms. Given a base interval B and retry number N, the back-off for the retry is in
the range :math:`\big[0, (2^N-1)B\big)`. For example, given the default interval, the first retry
will be delayed randomly by 0-24ms, the 2nd by 0-74ms, the 3rd by 0-174ms, and so on. The
Expand All @@ -51,6 +51,13 @@ A few notes on how Envoy does retries:
upstream.base_retry_backoff_ms runtime parameter. The back-off intervals can also be modified
by configuring the retry policy's
:ref:`retry back-off <envoy_v3_api_field_config.route.v3.RetryPolicy.retry_back_off>`.
* Envoy can also be configured to use feedback from the upstream server to decide the interval between
retries. Response headers like ``Retry-After`` or ``X-RateLimit-Reset`` instruct the client how long
to wait before re-trying. The retry policy's
:ref:`rate limited retry back off <envoy_v3_api_field_config.route.v3.RetryPolicy.rate_limited_retry_back_off>`
strategy can be configured to expect a particular header, and if that header is present in the response Envoy
will use its value to decide the back-off. If the header is not present, or if it cannot be parsed
successfully, Envoy will use the default exponential back-off algorithm instead.

.. _config_http_filters_router_x-envoy-retry-on:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ Every cluster has a statistics tree rooted at *cluster.<name>.* with the followi
upstream_rq_rx_reset, Counter, Total requests that were reset remotely
upstream_rq_tx_reset, Counter, Total requests that were reset locally
upstream_rq_retry, Counter, Total request retries
upstream_rq_retry_backoff_exponential, Counter, Total retries using the exponential backoff strategy
upstream_rq_retry_backoff_ratelimited, Counter, Total retries using the ratelimited backoff strategy
upstream_rq_retry_limit_exceeded, Counter, Total requests not retried due to exceeding :ref:`the configured number of maximum retries <config_http_filters_router_x-envoy-max-retries>`
upstream_rq_retry_success, Counter, Total request retry successes
upstream_rq_retry_overflow, Counter, Total requests not retried due to circuit breaking or exceeding the :ref:`retry budget <envoy_v3_api_field_config.cluster.v3.CircuitBreakers.Thresholds.retry_budget>`
Expand Down
5 changes: 3 additions & 2 deletions docs/root/intro/arch_overview/http/http_routing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,9 @@ Envoy allows retries to be configured both in the :ref:`route configuration
<envoy_v3_api_field_config.route.v3.RouteAction.retry_policy>` as well as for specific requests via :ref:`request
headers <config_http_filters_router_headers_consumed>`. The following configurations are possible:

* **Maximum number of retries**: Envoy will continue to retry any number of times. An exponential
backoff algorithm is used between each retry. Additionally, *all retries are contained within the
* **Maximum number of retries**: Envoy will continue to retry any number of times. The intervals between
retries are decided either by an exponential backoff algorithm (the default), or based on feedback
from the upstream server via headers (if present). Additionally, *all retries are contained within the
overall request timeout*. This avoids long request times due to a large number of retries.
* **Retry conditions**: Envoy can retry on different types of conditions depending on application
requirements. For example, network failure, all 5xx response codes, idempotent 4xx response codes,
Expand Down
2 changes: 1 addition & 1 deletion docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ New Features
* postgres network filter: :ref:`metadata <config_network_filters_postgres_proxy_dynamic_metadata>` is produced based on SQL query.
* ratelimit: added :ref:`enable_x_ratelimit_headers <envoy_v3_api_msg_extensions.filters.http.ratelimit.v3.RateLimit>` option to enable `X-RateLimit-*` headers as defined in `draft RFC <https://tools.ietf.org/id/draft-polli-ratelimit-headers-03.html>`_.
* rbac filter: added a log action to the :ref:`RBAC filter <envoy_v3_api_msg_config.rbac.v3.RBAC>` which sets dynamic metadata to inform access loggers whether to log.
* redis: added fault injection support :ref:`fault injection for redis proxy <envoy_v3_api_field_extensions.filters.network.redis_proxy.v3.RedisProxy.faults>`, described further in :ref:`configuration documentation <config_network_filters_redis_proxy>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you had a merge issue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Fixed now.

* router: added a new :ref:`rate limited retry back off <envoy_v3_api_msg_config.route.v3.RetryPolicy.RateLimitedRetryBackOff>` strategy that uses headers like `Retry-After` or `X-RateLimit-Reset` to decide the back off interval.
* router: added new
:ref:`envoy-ratelimited<config_http_filters_router_retry_policy-envoy-ratelimited>`
retry policy, which allows retrying envoy's own rate limited responses.
Expand Down
Loading