- Author(s): Sergii Tkachenko (@sergiitk)
- Approver: Mark Roth (@markdroth)
- Status: In Review
- Implemented in:
- Last updated: 2024-11-04
- Discussion at:
TODO(sergiitk): <google group thread>
We're adding support for global rate limiting to xDS-enabled gRPC servers. Users will be able to configure per-time-unit quotas based on request metadata. Rate Limit Quota Service will fairly distribute request quotas across participating servers.
Global rate limiting allows mesh users to manage fair consumption of their services and prevent misbehaving clients from overloading the services. We will implement quota-based rate limiting, where rate-limiting decisions are asynchronously offloaded to Rate Limiting Quota Service (RLQS). Requests are grouped into buckets based on their metadata, and gRPC servers periodically report bucket usages. RLQS aggregates the data from different gRPC servers, and fairly distributes the quota among them. This approach is best suited for high-request-per-second applications, where a certain margin of error is acceptable as long as expected average QPS is achieved.
To support RLQS, we'll need to implement several other xDS-related features, which are covered in the proposal:
- xDS Control Plane will provide RLQS connection details
in the filter config via
GrpcService
message. - Quota assignments will be configured
via
TokenBucket
message. - RPCs will be matched into buckets using Unified Matcher API.
- One of the matching mechanisms will be CEL (Common Expression Language).
- RLQS filter state will persist across LDS/RDS updates using cache retention mechanism similar to the one implemented for gRFC A83.
- A27: xDS-Based Global Load Balancing
- A36: xDS-Enabled Servers
- A39: xDS HTTP Filter Support
- A41: xDS RBAC Support
- A83: xDS GCP Authentication Filter
The diagram below shows the conceptual components of the RLQS Filter. Note that the actual implementation may vary depending on the language.
---
config:
theme: base
themeVariables:
clusterBorder: "#777"
---
graph TD
%% RLQS Components Flowchart v8
%% == nodes ==
subgraph grpc_client_box [gRPC Client]
request{{RPC}}
end
subgraph rlqs_server_box [RLQS Server]
rlqs[(RLQS)]
end
subgraph grpc_server_box [gRPC Server]
rlqs_filter(RLQS HTTP Filter)
rlqs_cache(RLQS Cache)
subgraph rlqs_filter_state_box [RLQS Filter State]
rlqs_client(RLQS Client)
rlqs_filter_state(RLQS Filter State)
report_timers(Report Timers)
matcher_tree(Matcher Tree)
rlqs_bucket_cache(RLQS Bucket Cache)
rlqs_bucket(RLQS Bucket)
end
rpc_handler("Filter's onClientCall handler")
end
%% == edges ==
rlqs_filter -- " Get RLQS Filter State<br />per unique config " --> rlqs_cache -- " getOrCreate(config) " --> rlqs_filter_state
rlqs_filter -- " Pass RLQS Filter State<br />for the route " --> rpc_handler -- " rateLimit(call) " --> rlqs_filter_state
request --> rpc_handler
rlqs_filter_state --o matcher_tree & report_timers
rlqs_filter_state -- sendUsageReports --> rlqs_client
rlqs_filter_state -- CRUD --> rlqs_bucket_cache
rlqs_client -- onBucketsUpdate --> rlqs_filter_state
rlqs_bucket_cache -- " getOrCreate(bucketId)<br />Atomic Updates " --> rlqs_bucket
rlqs_client <-. gRPC Stream .-> rlqs
style request stroke: RoyalBlue, stroke-width: 2px;
linkStyle 3,4 stroke: RoyalBlue, stroke-width: 2px;
linkStyle 11 stroke: Teal, stroke-width: 3px;
The filter parses the config, combines LDS filter config with RDS overrides, and
generates the onClientCall
handlers (aka interceptors in Java and Go, and
filters in C++).
RLQS Cache persists across LDS/RDS updates. It maps unique filter configs to RLQS Filter State instances, and provides the thread safety for creating and accessing them. Each unique filter config generates a unique RLQS Filter state, a 1:1 mapping.
RLQS Filter State contains the business logic for rate limiting, and the current
state of rate limit assignments per bucket. RLQS Filter State is what's passed
to the onCallHandler
. It exposes the public "rateLimit()
" method, which
takes request metadata as an argument.
RLQS Filter State evaluates the metadata against the matcher tree to match the request into a bucket. The Bucket holds the Rate Limit Quota assigned by the RLQS server (f.e. 100 requests per minute), and aggregates the number of requests it allowed/denied. This information is used to make the rate limiting decision.
The aggregated number of requests is reported to the RLQS server at configured intervals. The report action is triggered by the Report Timers. RLQS Client manages a gRPC stream to the RLQS server. It's used by the filter state to send periodic bucket usage reports, and to receive new rate limit quota assignments to the buckets.
xDS Control Plane provides RLQS connection details in GrpcService
message (
already supported by Envoy). GrpcService
supports two modes:
GrpcService.EnvoyGrpc
, Envoy's minimal custom gRPC client implementation.GrpcService.GoogleGrpc
, regular gRPC-cpp client.
For obvious reasons, we'll only support the GoogleGrpc
mode.
In GrpcService.GoogleGrpc
, xDS Control Plane provides the target
URI, channel_credentials
, and call_credentials
. If the xDS Control Plane is
compromised, the attacker could configure the xDS clients to talk to other
malicious Control Plane, leading to such potential exploits as:
- Leaking customer's Application Default Credentials OAuth token.
- Causing MalOut/DDoS by sending bad data from the compromised RLQS (f.e. set
rate limit policy to
ALLOW_ALL
/DENY_ALL
).
To prevent that, we'll introduce the allow-list to the bootstrap file introduced in gRFC A27. This allow-list will be a map from the fully-qualified server target URI to an object containing channel credentials to use.
// The allowlist of Control Planes allowed to be configured via xDS.
"allowed_grpc_services": {
// The key is fully-qualified server URI.
"dns:///xds.example.org:443": {
// The value is an object containing "channel_creds".
"channel_creds": [
// The format is identical to xds_servers.channel_creds.
{
"type": "string containing channel cred type",
"config": "optional JSON object containing config for the type"
}
]
}
}
When xDS Control Plane configures connection to another control plane
via GrpcService.GoogleGrpc
message, we'll inspect the allow-list for the
matching target URI.
- If target URI is not present, we don't create the connection to the requested Control Plane, and NACK the xDS resource.
- If target URI is present, we create the connection to the requested Control Plane using the channel credentials provided in the bootstrap file. Transport security configuration provided by the TD is ignored.
Important
This solution is not specific to RLQS, and should be used with any
other Control Planes configured via GrpcService
message.
RPCs will be matched into buckets using Unified Matcher API — an adaptable framework that can be used in any xDS component that needs matching features.
Envoy provides two syntactically equivalent Unified Matcher
definitions: envoy.config.common.matcher.v3.Matcher
and xds.type.matcher.v3.Matcher
.
We will only support the latter, which is the preferred version for all new APIs
using Unified Matcher.
For RLQS, Unified Matcher tree will be provided in the filter config. Evaluating
the tree against RPC metadata will yield RateLimitQuotaBucketSettings
, which
contains the information needed to associate the RPC with bucket_id
and the
default rate limiting configuration.
In this iteration the following Unified Mather extensions will be supported:
- Inputs:
- Custom Matchers:
We will support request metadata matching via CEL expressions. Only Canonical
CEL and only checked expressions will be supported (cel.expr.CheckedExpr
).
CEL evaluation environment is a set of available variables and extension functions in a CEL program. We will match Envoy CEL environment.
Similar to Envoy, we will support standard CEL functions except comprehension-style macros.
CEL Method | Description |
---|---|
size(x) |
Returns the length of a container x (string, bytes, list, map). |
x.matches(y) |
Returns true if the string x is partially matched by the specified RE2 pattern y. |
x.contains(y) |
Returns true if the string x contains the substring y. |
x.startsWith(y) |
Returns true if the string x begins with the substring y. |
x.endsWith(y) |
Returns true if the string x ends with the substring y. |
timestamp(x) , timestamp.get*(x) , duration |
Date/time functions. |
in , [] |
Map/list indexing. |
has(m.x) |
(macro) Returns true if the map m has the string "x" as a key. |
int , uint , double , string , bytes , bool |
Conversions and identities. |
== , != , > , < , <= , >= |
Comparisons. |
or , && , + , - , / , * , % , ! |
Basic functions. |
For RLQS, only the request
variable is supported in CEL expressions. We will
adapt Envoy's Request Attributes
for gRPC.
Attribute | Type | gRPC source | Envoy Description |
---|---|---|---|
request.path |
string |
Full method name1 | The path portion of the URL. |
request.url_path |
string |
Same as request.path |
The path portion of the URL without the query string. |
request.host |
string |
Authority2 | The host portion of the URL. |
request.scheme |
string |
Not set | The scheme portion of the URL. |
request.method |
string |
POST 3 |
Request method. |
request.headers |
map<string, string> |
metadata 4 |
All request headers indexed by the lower-cased header name. |
request.referer |
string |
metadata["referer"] |
Referer request header. |
request.useragent |
string |
metadata["user-agent"] |
User agent request header. |
request.time |
timestamp |
Not set | Time of the first byte received. |
request.id |
string |
metadata["x-request-id"] |
Request ID corresponding to x-request-id header value |
request.protocol |
string |
Not set | Request protocol. |
request.query |
string |
"" |
The query portion of the URL. |
1 request.path
- CPP:
metadata[":path"]
- Go:
grpc.Method(ctx)
- Java:
"/" + serverCall.getMethodDescriptor().getFullMethodName()
2 request.host
- CPP, Go:
metadata[":authority"]
- Java:
serverCall.getAuthority()
3 request.method
Hard-coded to "POST"
if unavailable and a code audit confirms the server
denies requests for all other method types.
4 request.headers
As defined in gRFC A41, "header" field.
For performance reasons, CEL variables should be resolved on demand. CEL Runtime provides the different variable resolving approaches based on the language:
RLQS Filter State holds the bucket usage data, report timers and the bidirectional stream to the RLQS server. To prevent the loss of state across LDS/RDS updates, RLQS filter will require a cache retention mechanism similar to the one implemented for gRFC A83.
The scope of each RLQS Filter Cache instance will be per server instance (same scope as the filter chain) and per filter name.
RLQS implementations will provide a mechanism for new instances of the filter to retain the cache from previous instances. There may be multiple instances of the RLQS Filter State, each one mapped to a unique filter config generated from LDS config, and RDS overrides. Consider the following example that demonstrates the lifecycle of RLQS Filter Cache.
---
config:
sequence:
showSequenceNumbers: true
height: 46
diagramMarginX: 40
diagramMarginY: 40
---
sequenceDiagram
%% gRFC: RLQS Filter Cache Lifecycle v1.1
participant xds as Control Plane
participant filter as RLQS HTTP Filter
participant cache as RLQS Cache
participant e1 as RlqsFilterState(c1)
participant e2 as RlqsFilterState(c2)
# Notes
Note right of xds: r1-4: routes <br />c1-2: unique filter configs
%% LDS 1
xds ->> filter: LDS1<br />RLQS{r1=c1, r2=c2, r3=c2}
filter ->> cache: r1: getOrCreate(c1)
cache ->>+ e1: new RlqsFilterState(c1)
filter ->> cache: r2: getOrCreate(c2)
cache ->>+ e2: new RlqsFilterState(c2)
filter ->> cache: r3: getOrCreate(c2)
Note over filter: r1: RlqsFilterState(c1)<br/>r2: RlqsFilterState(c2)<br/>r3: RlqsFilterState(c2)
%% RDS 1
xds ->> filter: RDS1<br />RLQS{r1=c2}
filter ->> cache: r1: getOrCreate(c2)
filter ->> cache: shutdownFilterState(c1)
cache -x e1: RlqsFilterState(c1).shutdown()
deactivate e1
Note over filter: r1: RlqsFilterState(c2)<br/>r2: RlqsFilterState(c2)<br/>r3: RlqsFilterState(c2)
%% LDS 2
xds ->> filter: LDS2<br />RLQS{r3=c2, r4=c2}
filter ->> cache: r4: getOrCreate(c2)
Note over filter: r3: RlqsFilterState(c2)<br/>r4: RlqsFilterState(c2)
%% End
deactivate e2
LDS 1
In this example, the RLQS filter is configured for three routes: r1
, r2
,
and r3
. Each unique config generates a unique RLQS Filter
State: RlqsFilterState(c1)
for the config c1
, and RlqsFilterState(c2)
for
the config c2
. After processing the first LDS update, we've generated
onCallHandlers for three routes:
r1
, referencingRlqsFilterState(c1)
.r2
, referencingRlqsFilterState(c2)
.r3
, also referencingRlqsFilterState(c2)
.
RDS 1
RDS 1 updates RLQS config for the route r1
so it's identical to config c2
.
We retrieve RlqsFilterState(c2)
from the RLQS Cache and generate new
onCallHandlers for route r2
. RlqsFilterState(c1)
is no longer referenced by
any onCallHandler, and can be destroyed with all associated resources.
LDS 2
LDS 2 update removes r1
and r2
, and adds new route r4 with the config
identical to c2
. While onCallHandlers for routes r1
and r2
are
destroyed, RlqsFilterState(c2)
is still used by two onCallHandlers, so it's
preserved in RLQS Cache.
With this proposal, the filter state is lost if change is made to the filter config, including updates to inconsequential fields such as deny response status. Additional logic can be introduced to handle updates to such fields while preserving the filter state.
There are several mutex-synchronized operations executed in
latency-sensitive onCallHandler
:
- Inserting/reading a bucket from the bucket cache using
bucket_id
. Note thatbucket_id
is represented as aMap<String, String>
, which may introduce complexities in efficient cache sharding for certain programming languages. - Incrementing
num_request_allowed
/num_request_denied
bucket counters.
Each gRPC implementation needs to consider what synchronization primitives are available in their language to minimize the thread lock time.
Note
Not a reference implementation. Only for flow illustration purposes.
final RlqsFilterState filterState = rlqsCache.getOrCreateFilterState(config);
return new ServerInterceptor() {
@Override
public <ReqT, RespT> Listener<ReqT> interceptCall(
ServerCall<ReqT, RespT> call,
Metadata headers, ServerCallHandler<ReqT, RespT> next) {
// TODO: handle filter_enabled and filter_enforced
// RlqsClient matches the request into a bucket,
// and returns the rate limiting result.
RlqsRateLimitResult result =
filterState.rateLimit(HttpMatchInput.create(headers, call));
// Allowed.
if (result.isAllowed()) {
return next.startCall(call, headers);
}
// Denied: fail the call with given Status.
call.close(
result.denyResponse().status(),
result.denyResponse().headersToAdd());
return new ServerCall.Listener<ReqT>(){};
}
};
public RlqsRateLimitResult rateLimit(HttpMatchInput input) {
// Perform request matching. The result is RlqsBucketSettings.
RlqsBucketSettings bucketSettings = bucketMatchers.match(input);
// BucketId may be dynamic (f.e. based on headers).
RlqsBucketId bucketId = bucketSettings.bucketIdForRequest(input);
RlqsBucket bucket = bucketCache.getOrCreate(bucketId, bucketSettings, this::onNewBucket);
return bucket.rateLimit();
}
private void onNewBucket(RlqsBucket newBucket) {
// The report for the first RPC is sent immediately.
scheduleImmediateReport(newBucket);
registerReportTimer(newBucket.getReportingInterval());
}
During initial development, this feature will be enabled via
the GRPC_EXPERIMENTAL_XDS_ENABLE_RLQS
environment variable. This environment
variable protection will be removed once the feature has proven stable.
Rate Limiting Service (RLS) is another Global Rate Limiting solution supported by Envoy. While it's best suited for precise cases, where even a single request over the limit must be throttled, this approach performs synchronous (blocking) per-HTTP-request rate limit check.
The problem with a compromised xDS Control Plane configuring a connection to a malicious RLQS server may be solved holistically by signing xDS messages cryptographically. This feature would solve multiple problems in the same class, but it's out-of-scope of RLQS.
The proposed solution with the change to the bootstrap file was designed to be compatible with such future protocol extension.
- The initial implementation will be in Java.
- C-core, Python, Go are TBD.