Request limiting in Gitaly

In the GitLab ecosystem, Gitaly is the service that is at the bottom of the stack for Git data access. This means that when there is a surge of requests to retrieve or change a piece of Git data, the I/O happens in Gitaly. This can lead to Gitaly being overwhelmed due to system resource exhaustion because all Git access goes through Gitaly.

If there is a surge of traffic beyond what Gitaly can handle, Gitaly should be able to push back on the client calling. Gitaly shouldn't subserviently agree to process more than it can handle.

We can turn several different knobs in Gitaly that put a limit on different kinds of traffic patterns.

Concurrency queue

Limit the number of concurrent RPCs that are in flight on each Gitaly node for each repository per RPC using [[concurrency]] configuration:

[[concurrency]]
rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
max_per_repo = 1

For example:

One clone request comes in for repository "A" (a largish repository).
While this RPC is executing, another request comes in for repository "A". Because max_per_repo is 1 in this case, the second request blocks until the first request is finished.

An in-memory queue of requests can build up in Gitaly that are waiting their turn. Because this is a potential vector for a memory leak, two other values in the [[concurrency]] configuration can prevent an unbounded in-memory queue of requests:

max_queue_wait is the maximum amount of time a request can wait in the concurrency queue. When a request waits longer than this time, it returns an error to the client.
max_queue_size is the maximum size the concurrency queue can grow for a given RPC. If a concurrency queue is at its maximum, subsequent requests return with an error.

For example:

[[concurrency]]
rpc = "/gitaly.SmartHTTPService/PostUploadPackWithSidechannel"
max_per_repo = 1
max_queue_wait = "1m"
max_queue_size = 5

Rate limiting

To allow Gitaly to put back pressure on its clients, administrators can set a rate limit per repository for each RPC:

[[rate_limiting]]
rpc =  "/gitaly.RepositoryService/RepackFull"
interval = "1m"
burst = 1

The rate limiter is implemented using the concept of a token bucket. A token bucket has capacity burst and is refilled at an interval of interval. When a request comes into Gitaly, a token is retrieved from the token bucket per request. When the token bucket is empty, there are no more requests for that RPC for a repository until the token bucket is refilled again. There is a token bucket each RPC for each repository.

In the above configuration, the token bucket has a capacity of 1 and gets refilled every minute. This means that Gitaly only accepts 1 RepackFull request per repository each minute.

Requests that come in after the token bucket is full (and before it is replenished) are rejected with an error.

Errors

With concurrency limiting and rate limiting, Gitaly responds with a structured gRPC gitalypb.LimitError error with:

A Message field that describes the error.
A BackoffDuration field that provides the client with a time when it is safe to retry. If 0, it means it should never retry.

Gitaly clients (gitlab-shell, workhorse, Rails) must parse this error and return sensible error messages to the end user. For example:

Something trying to clone using HTTP or SSH.
The GitLab application.
Something calling the API.

Metrics

Metrics are available that provide visibility into how these limits are being applied. See the GitLab Documentation for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backpressure.md

backpressure.md

Request limiting in Gitaly

Concurrency queue

Rate limiting

Errors

Metrics

Files

backpressure.md

Latest commit

History

backpressure.md

File metadata and controls

Request limiting in Gitaly

Concurrency queue

Rate limiting

Errors

Metrics