status | title | creation-date | last-updated | authors | collaborators | ||||
---|---|---|---|---|---|---|---|---|---|
proposed |
Queueing Concurrent Runs |
2023-02-24 |
2023-03-20 |
|
|
- Summary
- Motivation
- Proposal
- Design Details
- Design Evaluation
- Alternatives
- Implementation Plan
- References
This TEP proposes allowing users to control the number of PipelineRuns and TaskRuns (and maybe CustomRuns) that can run concurrently. This proposal also includes controlling the Tekton run time resources (pipelineRun
, taskRun
, and customRun
) in a cluster.
The focus of this TEP is different from TEP-0120: Canceling Concurrent PipelineRuns, which focuses on PipelineRuns that may have ordering dependencies, but we may choose to develop a solution that addresses both TEPs.
It is very common to build and execute many different pipelines in any CI/CD deployments. While running such workloads in parallel, the cluster could be overloaded and in the worst case, it could become unresponsive.
The motivation of this proposal is to support both concurrent runs of a single
Pipeline
or single Task
, or a group of unrelated PipelineRuns
or TaskRuns
in a given cluster.
In addition, TEP-0092: TaskRun Timeouts provided motivation for this proposal. TEP-0092 proposed capping the amount of time a TaskRun could be queued for. However, the use cases specified in TEP-0092 (running Tasks in a resource constrained environment, or "fire and forget") would also be met by queueing PipelineRuns or TaskRuns for execution. Queueing may also be easier to understand and more flexible.
- Can "fire and forget" runs by creating many runs of one or more
Pipelines
but preventing all of them from executing at once. - Can "fire and forget" runs by creating many runs of one or more
Tasks
but preventing all of them from executing at once. - Can control the number of matrixed Runs that can execute at once
- Priority and preemption of queued Runs, including prioritizing based on compute resources
- Load testing or performance testing
Only allow executing a single instance of a TaskRun
or a PipelineRun
at any given time, for example:
- An integration test communicates with a stateful external service (like a database), and a developer wants to ensure
that integration testing
TaskRuns
within their CIPipelineRun
don’t run concurrently with other integration testingTaskRuns
of the same CIPipeline
(based on this comment).
Some of these use cases require being able to limit concurrent PipelineRuns
for a given Pipeline
, or concurrent TaskRuns
for a given Task
.
Others require being able to limit the total number of PipelineRuns
and TaskRuns
, regardless of whether they are associated with the same Pipeline
or Task
.
-
An organization has multiple teams working on a mobile application with a limited number of test devices. They want to limit the number of concurrent CI runs per team, to prevent one team from using all the available devices and crowding out CI runs from other teams.
-
A cluster operator wants to cap the number of matrixed TaskRuns (alpha) that can run at a given time.
- Currently, we have the feature flag “default-maximum-matrix-fan-out”, which restricts the total number of TaskRuns that can be created from one Pipeline Task. However, we would like to support capping the number of matrixed TaskRuns that can run concurrently, instead of statically capping the number of matrixed TaskRuns that can be created at all.
-
A
PipelineRun
orTaskRun
communicates with a rate-limited external service, as described in this issue. Another example of such requirement is an API call to package registries to retrieve package metadata for SBOMs. The package registries blocks the issuer if the number of requests exceeds their allowed quota. These requests could be generated from a singlePipeline
/Task
or unrelatedPipelines
/Tasks
. -
Tekton previously used GKE clusters allocated by Boskos for our Pipelines integration tests, and Boskos caps the number of clusters that can be used at a time. It would have been useful to queue builds so that they could not launch until a cluster was available. (We now use KinD for our Pipelines integration tests.)
-
A Pipeline performs multiple parallelizable tasks with different concurrency requirements, as described in this comment.
- Configuring different concurrency limits for multiple
pipelineTasks
of the samePipeline
can be part of future work for this proposal.
- Configuring different concurrency limits for multiple
-
A large number of resource intensive
pipelineTasks
are running in parallel, causing a huge load on a node. This load is causing other unrelatedTaskRuns
(not part of the samePipeline
) to get timed out. -
A large number of
PipelineRuns
andTaskRuns
are running concurrently, resulting in an overloaded cluster. ThesePipelineRuns
could be thousands of runs of the samePipeline
or a combination of N differentPipelines
. ThesePipelines
could be related or unrelated; for example, they may access the same remote resources. The cluster operator would like to configure a queue of PipelineRuns/TaskRuns for fire-and-forget operations such as these.
Use an object count quota to restrict the number of Runs that can exist in a namespace. This doesn't account for Runs' state (e.g. completed and pending PipelineRuns count towards this total) and doesn't support queueing or more advanced concurrency strategies.
- Must be able to cap the amount of time a Run can be queued for
- Must be able to clear the queue manually without having to cancel Runs individually
Feature Requests
- Concurrency limiter controller
- Tekton Queue. Concurrency
- Ability to throttle concurrent TaskRuns
- Controlling max parallel jobs per Pipeline
- Provide a Pipeline concurrency limit
Design Proposals
Similar features in other CI/CD systems
- Github Actions concurrency controls
- Gitlab -Global concurrency -Request concurrency
- Pipelines as Code concurrency limit per repository