Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MaxTotalQueryLength and fill it from MaxQueryLength if unset #3058

Merged
merged 9 commits into from
Sep 29, 2022
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@
* `-store-gateway.sharding-ring.etcd.tls-min-version`
* [ENHANCEMENT] Store-gateway: Add `-blocks-storage.bucket-store.max-concurrent-reject-over-limit` option to allow requests that exceed the max number of inflight object storage requests to be rejected. #2999
* [ENHANCEMENT] Ingester: improved the performance of label value cardinality endpoint. #3048
* [ENHANCEMENT] Query-frontend: allow setting a separate limit on the total (before splitting/sharding) query length of range queries with the new experimental `-query-frontend.max-total-query-length` flag, which defaults to `-store.max-query-length` if unset or set to 0. #3058
* [BUGFIX] Querier: Fix 400 response while handling streaming remote read. #2963
* [BUGFIX] Fix a bug causing query-frontend, query-scheduler, and querier not failing if one of their internal components fail. #2978
* [BUGFIX] Querier: re-balance the querier worker connections when a query-frontend or query-scheduler is terminated. #3005
Expand Down
13 changes: 12 additions & 1 deletion cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -3076,7 +3076,7 @@
"kind": "field",
"name": "max_query_length",
"required": false,
"desc": "Limit the query time range (end - start time). This limit is enforced in the query-frontend (on the received query), in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.",
"desc": "Limit the query time range (end - start time). This limit is enforced in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "store.max-query-length",
Expand Down Expand Up @@ -3154,6 +3154,17 @@
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "max_total_query_length",
"required": false,
"desc": "Limit the total query time range (end - start time). This limit is enforced in the query-frontend on the received query. Defaults to the value of -store.max-query-length if set to 0.",
"fieldValue": null,
"fieldDefaultValue": 0,
"fieldFlag": "query-frontend.max-total-query-length",
"fieldType": "duration",
"fieldCategory": "experimental"
},
{
"kind": "field",
"name": "cardinality_analysis_enabled",
Expand Down
4 changes: 3 additions & 1 deletion cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -1357,6 +1357,8 @@ Usage of ./cmd/mimir/mimir:
Maximum number of queriers that can handle requests for a single tenant. If set to 0 or value higher than number of available queriers, *all* queriers will handle requests for the tenant. Each frontend (or query-scheduler, if used) will select the same set of queriers for the same tenant (given that all queriers are connected to all frontends / query-schedulers). This option only works with queriers connecting to the query-frontend / query-scheduler, not when using downstream URL.
-query-frontend.max-retries-per-request int
Maximum number of retries for a single request; beyond this, the downstream error is returned. (default 5)
-query-frontend.max-total-query-length duration
[experimental] Limit the total query time range (end - start time). This limit is enforced in the query-frontend on the received query. Defaults to the value of -store.max-query-length if set to 0.
-query-frontend.parallelize-shardable-queries
True to enable query sharding.
-query-frontend.querier-forget-delay duration
Expand Down Expand Up @@ -1954,7 +1956,7 @@ Usage of ./cmd/mimir/mimir:
-store.max-labels-query-length duration
Limit the time range (end - start time) of series, label names and values queries. This limit is enforced in the querier. If the requested time range is outside the allowed range, the request will not fail but will be manipulated to only query data within the allowed time range. 0 to disable.
-store.max-query-length duration
Limit the query time range (end - start time). This limit is enforced in the query-frontend (on the received query), in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.
Limit the query time range (end - start time). This limit is enforced in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.
-target comma-separated-list-of-strings
Comma-separated list of components to include in the instantiated process. The default value 'all' includes all components that are required to form a functional Grafana Mimir instance in single-binary mode. Use the '-modules' command line flag to get a list of available components, and to see which components are included with 'all'. (default all)
-tenant-federation.enabled
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -574,7 +574,7 @@ Usage of ./cmd/mimir/mimir:
-store.max-labels-query-length duration
Limit the time range (end - start time) of series, label names and values queries. This limit is enforced in the querier. If the requested time range is outside the allowed range, the request will not fail but will be manipulated to only query data within the allowed time range. 0 to disable.
-store.max-query-length duration
Limit the query time range (end - start time). This limit is enforced in the query-frontend (on the received query), in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.
Limit the query time range (end - start time). This limit is enforced in the querier (on the query possibly split by the query-frontend) and ruler. 0 to disable.
-target comma-separated-list-of-strings
Comma-separated list of components to include in the instantiated process. The default value 'all' includes all components that are required to form a functional Grafana Mimir instance in single-binary mode. Use the '-modules' command line flag to get a list of available components, and to see which components are included with 'all'. (default all)
-tenant-federation.enabled
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ The following features are currently experimental:
- Snapshotting of in-memory TSDB data on disk when shutting down (`-blocks-storage.tsdb.memory-snapshot-on-shutdown`)
- Out-of-order samples ingestion (`-ingester.out-of-order-allowance`)
- Query-frontend
- `-query-frontend.max-total-query-length`
- `-query-frontend.querier-forget-delay`
- Instant query splitting (`-query-frontend.split-instant-queries-by-interval`)
- Query-scheduler
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2478,8 +2478,8 @@ The `limits` block configures default and per-tenant limits imposed by component
[max_query_lookback: <duration> | default = 0s]

# Limit the query time range (end - start time). This limit is enforced in the
# query-frontend (on the received query), in the querier (on the query possibly
# split by the query-frontend) and ruler. 0 to disable.
# querier (on the query possibly split by the query-frontend) and ruler. 0 to
# disable.
# CLI flag: -store.max-query-length
[max_query_length: <duration> | default = 0s]

Expand Down Expand Up @@ -2530,6 +2530,12 @@ The `limits` block configures default and per-tenant limits imposed by component
# CLI flag: -query-frontend.split-instant-queries-by-interval
[split_instant_queries_by_interval: <duration> | default = 0s]

# (experimental) Limit the total query time range (end - start time). This limit
# is enforced in the query-frontend on the received query. Defaults to the value
# of -store.max-query-length if set to 0.
# CLI flag: -query-frontend.max-total-query-length
[max_total_query_length: <duration> | default = 0s]

# Enables endpoints used for cardinality analysis.
# CLI flag: -querier.cardinality-analysis-enabled
[cardinality_analysis_enabled: <boolean> | default = false]
Expand Down
14 changes: 13 additions & 1 deletion docs/sources/operators-guide/mimir-runbooks/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -1376,7 +1376,7 @@ How to **fix** it:

### err-mimir-max-query-length

This error occurs when the time range of a query exceeds the configured maximum length.
This error occurs when the time range of a partial (after possible splitting, sharding by the query-frontend) query exceeds the configured maximum length. For a limit on the total query length, see [err-mimir-max-total-query-length](#err-mimir-max-total-query-length).

Both PromQL instant and range queries can fetch metrics data over a period of time.
A [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) requires a `start` and `end` timestamp, so the difference of `end` minus `start` is the time range length of the query.
Expand All @@ -1389,6 +1389,18 @@ Mimir has a limit on the query length.
This limit is applied to partial queries, after they've split (according to time) by the query-frontend. This limit protects the system’s stability from potential abuse or mistakes.
To configure the limit on a per-tenant basis, use the `-store.max-query-length` option (or `max_query_length` in the runtime configuration).

### err-mimir-max-total-query-length

This error occurs when the time range of a query exceeds the configured maximum length. For a limit on the partial query length (after query splitting by interval and/or sharding), see [err-mimir-max-query-length](#err-mimir-max-query-length).

PromQL range queries can fetch metrics data over a period of time.
A [range query](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) requires a `start` and `end` timestamp, so the difference of `end` minus `start` is the time range length of the query.

Mimir has a limit on the query length.
This limit is applied to range queries before they are split (according to time) or sharded by the query-frontend. This limit protects the system’s stability from potential abuse or mistakes.
To configure the limit on a per-tenant basis, use the `-query-frontend.max-total-query-length` option (or `max_total_query_length` in the runtime configuration).
If this limit is set to 0, it takes its value from `-store.max-query-length`.

### err-mimir-tenant-max-request-rate

This error occurs when the rate of write requests per second is exceeded for this tenant.
Expand Down
6 changes: 3 additions & 3 deletions pkg/frontend/querymiddleware/limits.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ type Limits interface {
MaxQueryLookback(userID string) time.Duration

// MaxQueryLength returns the limit of the length (in time) of a query.
MaxQueryLength(userID string) time.Duration
MaxTotalQueryLength(userID string) time.Duration

// MaxQueryParallelism returns the limit to the number of split queries the
// frontend will process in parallel.
Expand Down Expand Up @@ -112,10 +112,10 @@ func (l limitsMiddleware) Do(ctx context.Context, r Request) (Response, error) {
}

// Enforce the max query length.
if maxQueryLength := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.MaxQueryLength); maxQueryLength > 0 {
if maxQueryLength := validation.SmallestPositiveNonZeroDurationPerTenant(tenantIDs, l.MaxTotalQueryLength); maxQueryLength > 0 {
queryLen := timestamp.Time(r.GetEnd()).Sub(timestamp.Time(r.GetStart()))
if queryLen > maxQueryLength {
return nil, apierror.New(apierror.TypeBadData, validation.NewMaxQueryLengthError(queryLen, maxQueryLength).Error())
return nil, apierror.New(apierror.TypeBadData, validation.NewMaxTotalQueryLengthError(queryLen, maxQueryLength).Error())
}
}

Expand Down
29 changes: 22 additions & 7 deletions pkg/frontend/querymiddleware/limits_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -119,10 +119,11 @@ func TestLimitsMiddleware_MaxQueryLength(t *testing.T) {
now := time.Now()

tests := map[string]struct {
maxQueryLength time.Duration
reqStartTime time.Time
reqEndTime time.Time
expectedErr string
maxQueryLength time.Duration
maxTotalQueryLength time.Duration
reqStartTime time.Time
reqEndTime time.Time
expectedErr string
}{
"should skip validation if max length is disabled": {
maxQueryLength: 0,
Expand All @@ -148,13 +149,19 @@ func TestLimitsMiddleware_MaxQueryLength(t *testing.T) {
maxQueryLength: thirtyDays,
reqStartTime: now.Add(-thirtyDays).Add(-100 * time.Hour),
reqEndTime: now,
expectedErr: "the query time range exceeds the limit",
expectedErr: "the total query time range exceeds the limit",
},
"should fail on a query on large time range over the limit, ending in the past": {
maxQueryLength: thirtyDays,
reqStartTime: now.Add(-4 * thirtyDays),
reqEndTime: now.Add(-2 * thirtyDays),
expectedErr: "the query time range exceeds the limit",
expectedErr: "the total query time range exceeds the limit",
},
"should succeed if total query length is higher than query length limit": {
maxQueryLength: thirtyDays,
maxTotalQueryLength: 8 * thirtyDays,
reqStartTime: now.Add(-4 * thirtyDays),
reqEndTime: now.Add(-2 * thirtyDays),
},
}

Expand All @@ -165,7 +172,7 @@ func TestLimitsMiddleware_MaxQueryLength(t *testing.T) {
End: util.TimeToMillis(testData.reqEndTime),
}

limits := mockLimits{maxQueryLength: testData.maxQueryLength}
limits := mockLimits{maxQueryLength: testData.maxQueryLength, maxTotalQueryLength: testData.maxTotalQueryLength}
middleware := newLimitsMiddleware(limits, log.NewNopLogger())

innerRes := newEmptyPrometheusResponse()
Expand Down Expand Up @@ -198,6 +205,7 @@ func TestLimitsMiddleware_MaxQueryLength(t *testing.T) {
type mockLimits struct {
maxQueryLookback time.Duration
maxQueryLength time.Duration
maxTotalQueryLength time.Duration
maxCacheFreshness time.Duration
maxQueryParallelism int
maxShardedQueries int
Expand All @@ -214,6 +222,13 @@ func (m mockLimits) MaxQueryLength(string) time.Duration {
return m.maxQueryLength
}

func (m mockLimits) MaxTotalQueryLength(string) time.Duration {
if m.maxTotalQueryLength == time.Duration(0) {
return m.maxQueryLength
}
return m.maxTotalQueryLength
}

func (m mockLimits) MaxQueryParallelism(string) int {
if m.maxQueryParallelism == 0 {
return 14 // Flag default.
Expand Down
1 change: 1 addition & 0 deletions pkg/util/globalerror/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ const (
MetricMetadataUnitTooLong ID = "unit-too-long"

MaxQueryLength ID = "max-query-length"
MaxTotalQueryLength ID = "max-total-query-length"
RequestRateLimited ID = "tenant-max-request-rate"
IngestionRateLimited ID = "tenant-max-ingestion-rate"
TooManyHAClusters ID = "tenant-too-many-ha-clusters"
Expand Down
6 changes: 6 additions & 0 deletions pkg/util/validation/errors.go
Original file line number Diff line number Diff line change
Expand Up @@ -263,6 +263,12 @@ func NewMaxQueryLengthError(actualQueryLen, maxQueryLength time.Duration) LimitE
maxQueryLengthFlag))
}

func NewMaxTotalQueryLengthError(actualQueryLen, maxTotalQueryLength time.Duration) LimitError {
return LimitError(globalerror.MaxTotalQueryLength.MessageWithPerTenantLimitConfig(
fmt.Sprintf("the total query time range exceeds the limit (query length: %s, limit: %s)", actualQueryLen, maxTotalQueryLength),
maxTotalQueryLengthFlag))
}

func NewRequestRateLimitedError(limit float64, burst int) LimitError {
return LimitError(globalerror.RequestRateLimited.MessageWithPerTenantLimitConfig(
fmt.Sprintf("the request has been rejected because the tenant exceeded the request rate limit, set to %v requests/s across all distributors with a maximum allowed burst of %d", limit, burst),
Expand Down
5 changes: 5 additions & 0 deletions pkg/util/validation/errors_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ func TestNewMaxQueryLengthError(t *testing.T) {
assert.Equal(t, "the query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-query-length). To adjust the related per-tenant limit, configure -store.max-query-length, or contact your service administrator.", err.Error())
}

func TestNewTotalMaxQueryLengthError(t *testing.T) {
err := NewMaxTotalQueryLengthError(time.Hour, time.Minute)
assert.Equal(t, "the total query time range exceeds the limit (query length: 1h0m0s, limit: 1m0s) (err-mimir-max-total-query-length). To adjust the related per-tenant limit, configure -query-frontend.max-total-query-length, or contact your service administrator.", err.Error())
}

func TestNewRequestRateLimitedError(t *testing.T) {
err := NewRequestRateLimitedError(10, 5)
assert.Equal(t, "the request has been rejected because the tenant exceeded the request rate limit, set to 10 requests/s across all distributors with a maximum allowed burst of 5 (err-mimir-tenant-max-request-rate). To adjust the related per-tenant limits, configure -distributor.request-rate-limit and -distributor.request-burst-size, or contact your service administrator.", err.Error())
Expand Down
Loading