[TEP-0120] Add proposal for concurrency controls #818

lbernick · 2022-09-09T20:44:23Z

This commit adds design details for canceling concurrent PipelineRuns.

/kind tep

lbernick · 2022-09-09T20:53:23Z

teps/0120-canceling-concurrent-pipelineruns.md

+The appeal of a solution involving a separate CRD is that any higher-level controller (e.g. Triggers, Workflows, Pipelines as Code) could get concurrency controls
+for "free" by creating a ConcurrencyControl, and handing the logic off to a separate controller.
+However, in practice, it's difficult to do this in a way that achieves good separation of concerns between reconcilers. Because the PipelineRun controller
+(or ConcurrencyControl controller) is not responsible for creating PipelineRuns, it has to rely on other components starting PipelineRuns as pending, and matching
+these PipelineRuns to the ConcurrencyControl via some strategy such as labels, ownerReferences, or modifying PipelineRuns themselves.
+This controller's logic is tied to the logic of these other components, requiring updates to be coordinated and essentially creating an
+unwritten contract between reconcilers. This solution adds significant complexity compared to the proposed solution while still requiring some concurrency code to be
+added to each higher-level controller that wants to support concurrency controls.


I really tried to get this solution working but I struggled to do so in a way that wasn't super complex-- this is my best attempt at explaining why.

lbernick · 2022-09-09T20:55:13Z

@dibyom @vdemeester @williamlfish @RafaeLeal @jerop PTAL

lbernick · 2022-09-09T21:01:56Z

Proof of concept: tektoncd/triggers#1446

vdemeester

I think there is a "slight" difference of approach in what I had and yours @lbernick.
My initial thought on concurrency was to provides the concurrency control primitives to tektoncd/pipelne so that other tools (be it upstream or elsewhere) could use them (hence having it on PipelineRun or at least in tektoncd/pipeline).

What is proposed here, however, is an "end-user" proposal, for Triggers. The two approaches are not antinomic, but one might use the 2nd. I do think the proposal on trigger probably make sense for triggers. But I do think the "primitives for concurrency control" shouldn't be in triggers (Triggers "description" is "… that allows you to create Kubernetes resources based on information it extracts from event payloads.").

As proposed here, any tool that wants to implement concurrency control and that doesn't rely on triggers, will need to "copy" what's proposed (or come up with its own approach) — which was already the case today.

teps/0120-canceling-concurrent-pipelineruns.md

vdemeester · 2022-09-12T08:04:14Z

teps/0120-canceling-concurrent-pipelineruns.md

+### Reconciler logic
+
+When a Trigger with a concurrency spec creates a new PipelineRun, it will substitute the parameters in the concurrency key and apply the concurrency key as a label
+with key "triggers.tekton.dev/concurrency". It will use an informer to find all PipelineRuns with the same concurrency key from the same Trigger, using label selectors.
+The reconciler will patch any matching PipelineRuns as canceled before creating the new PipelineRun, but will not wait for cancelation to complete.


From my point of view, the only "job" of triggers should be to create that label and most likely create the PipelineRun as pending, and have something else handle that label (aka not the triggers controller).

vdemeester · 2022-09-12T08:08:47Z

teps/0120-canceling-concurrent-pipelineruns.md

+for "free" by creating a ConcurrencyControl, and handing the logic off to a separate controller.
+However, in practice, it's difficult to do this in a way that achieves good separation of concerns between reconcilers. Because the PipelineRun controller
+(or ConcurrencyControl controller) is not responsible for creating PipelineRuns, it has to rely on other components starting PipelineRuns as pending, and matching
+these PipelineRuns to the ConcurrencyControl via some strategy such as labels, ownerReferences, or modifying PipelineRuns themselves.


Because the PipelineRun controller
(or ConcurrencyControl controller) is not responsible for creating PipelineRuns, it has to rely on other components starting PipelineRuns as pending, and matching
these PipelineRuns to the ConcurrencyControl via some strategy such as labels, ownerReferences, or modifying PipelineRuns themselves

Althought I slightly agree, the PipelineRun reconciler has all the possibilities to do whatever it wants with the newly created PipelineRun. We could decide that, if a "conccurency" label is set on a PipelineRun, we make it pending. This is a choice we can make (or not).

teps/0120-canceling-concurrent-pipelineruns.md

vdemeester · 2022-09-12T08:16:04Z

teps/0120-canceling-concurrent-pipelineruns.md

+This solution assumes that PipelineRuns using concurrency will typically be created by tooling such as Pipelines as Code, a Workflow, or similar,
+and would likely not have different concurrency strategies.


Which "is" kind of true, as this solution adds it in Triggers.

abayer · 2022-09-12T13:30:40Z

Yeah, I agree - Triggers is upstream of Pipelines, and isn't necessarily the only place users would want to be able to control concurrency.

lbernick · 2022-09-12T13:40:40Z

@vdemeester could you elaborate on what a "concurrency primitive" might look like?
I like the idea of building something that higher-level systems can use for concurrency controls, but I think concurrency controls belong at a higher level than Pipelines. It doesn't make sense to talk about concurrency controls for a single PipelineRun, so I'm not sure what a primitive in Pipelines would look like-- do you have an example in mind?

lbernick · 2022-09-12T16:06:59Z

/assign @vdemeester @abayer

lbernick · 2022-09-15T15:27:16Z

@abayer @vdemeester @chmouel @dibyom I've opened an alternative POC in the Pipelines repo for "concurrency primitives". PTAL and let me know if this is in line with what you are thinking: tektoncd/pipeline#5501

dibyom · 2022-09-15T20:40:02Z

So, I agree with @vdemeester 's comment about the primitive based approach. To me, that means that the solution has to be loosely coupled i.e. we should not require changes to pipelines/triggers code and ideally no changes to the existing crds either. This does not mean we cannot add those changes in later if we decide this is a very core component of say a trigger. Also, this does not mean we cannot provide a more opinionated approach in another higher level crd such as workflows. But the most basic implementation should be flexible and loosely coupled.

Here is roughly what I had in mind (let me know if some of this is infeasible)

A way for a user to specify the Concurrency configuration which contains 1. the concurrency key, 2. the strategy and 3. a way to match this concurrency config to some subset of pipelineruns - I think we can start we a label selector here and optimize later
(e.g. for the triggers PoC scenario - since each run created by a Triggers gets a label with the name of the trigger, the label selector can just match against that label)
On Run creation, modify the run to use PipelineRunPending. (This can happen in a mutating webhook so that we don't need to change code in Pipeline or Triggers)
For all runs in a pending state, a new reconciler first sees if a valid concurrency label is present. If not, it matches it against all label selectors for all relevant concurrency configs. If nothing matches, remove the pipelinerun pending state. If there is a match, add a new concurrency key label to the object, run the concurrency strategy (cancel any relevant runs) and then remove the pipelinerun pending state.

Some caveats/drawbacks of this approach I can think of

The matching might get complex/slow - maybe a higher level thing can set the concurrencyLabel directly
Label manipulation - Ideally the selection and the concurrency key should be immutable fields; not a easily changeable label
Multiple additional components - For workflows, since we generate the trigger template, we can always set the status to pending at that point instead of relying on a webhook to do so.

lbernick · 2022-09-19T12:29:34Z

thanks @dibyom for the feedback! I am experimenting with a solution in a separate reconciler, similar to what you've described. The one thing I might not end up implementing is the reconciler changing the pipelinerun status from pending -> running, partially because it's a bit of a code smell for the reconciler to edit the spec of the pipelinerun it's reconciling ~~and partially because I haven't figured out yet how to get this to actually work 😅~~ edit: dibyo and I figured this out!

Would be great to hear others' thoughts as well!

lbernick · 2022-09-19T13:16:26Z

There's one other issue with the mutating admission webhook strategy that @RafaeLeal mentioned to me. Labels and annotations aren't available during the create request (they're patched after creation), so we'd have to mark the pipelinerun as pending during an update request. However I'm not sure how to know during an update request whether it's appropriate to mark a pipelinerun as pending, and our webhook doesn't currently allow running pipelineruns to be changed to pending (although ofc this is something we can change)

lbernick · 2022-09-22T15:52:20Z

POC number 3 as requested! I haven't yet implemented a mutating admission webhook but that's next on the list.

lbernick · 2022-09-23T19:43:45Z

@ all -- I've updated the TEP to list out alternatives, and the proposed solution to basically state that we're going to experiment with a separate reconciler and use it in dogfooding, and we'll design a higher level api for concurrency in workflows once the workflows api exists and after getting experience testing out the experimental project. Also added links to existing POCs. Hoping this can be merged without major changes, please lmk what you think.

This commit adds design details for canceling concurrent PipelineRuns.

lbernick · 2022-09-29T13:55:56Z

@abayer @vdemeester PTAL, hoping to merge during API WG on Monday

abayer · 2022-09-30T17:15:59Z

/approve

Looks like a good start to me - looking forward to seeing where we end up going with this!

vdemeester · 2022-10-03T14:21:00Z

teps/0120-canceling-concurrent-pipelineruns.md

+- Do we want to allow a PipelineRun to be part of multiple concurrency groups?
+  - For example, when we later support queueing, we might want to allow users to configure a rule like "Cancel all but the last PR per pull request, and only allow 5 CI runs per repo at a time."
+- What concurrency strategies should we support?
+  - For the initial version of this proposal, we will only support cancelation. However, should we support all possible strategies for canceling PipelineRuns,


I think the default should be "graceful cancellation", aka cancel but run finally tasks.

sounds good-- the initial version I'm working on uses regular cancelation as the default but this should be super easy to change.

vdemeester · 2022-10-03T14:22:32Z

teps/0120-canceling-concurrent-pipelineruns.md

+  - For the initial version of this proposal, we will only support cancelation. However, should we support all possible strategies for canceling PipelineRuns,
+    including cancelation, graceful cancelation, and graceful stopping?
+- Should we attempt to prevent users from interfering with concurrency controls?
+  - Many of the proposed solutions rely on labels, and a user editing labels could change PipelineRun behavior. Is this desired?


I think so yes. A user can change labels on a Pod from a TaskRun and provoke a weird behavior for example. This is very similar here, we cannot completely stop the user from shooting its foot, and that's fine 🙃

tekton-robot · 2022-10-03T14:22:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: abayer, vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~teps/OWNERS~~ [abayer,vdemeester]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

afrittoli · 2022-10-03T16:04:34Z

LGTM in the working group
/lgtm

tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Sep 9, 2022

tekton-robot requested review from pradeepitm12 and vtereso September 9, 2022 20:44

tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 9, 2022

lbernick force-pushed the concurrency branch from 00a8977 to 1d102cb Compare September 9, 2022 20:52

lbernick commented Sep 9, 2022

View reviewed changes

lbernick force-pushed the concurrency branch from 1d102cb to 699747c Compare September 9, 2022 20:56

vdemeester reviewed Sep 12, 2022

View reviewed changes

tekton-robot assigned abayer and vdemeester Sep 12, 2022

dibyom mentioned this pull request Sep 14, 2022

how do I set pipelinerun wait for the previous pipelinerun execution to complete tektoncd/pipeline#5485

Closed

lbernick force-pushed the concurrency branch from 699747c to 9f357b8 Compare September 15, 2022 15:04

lbernick force-pushed the concurrency branch from 9f357b8 to 4a6a28e Compare September 23, 2022 19:40

[TEP-0120] Add proposal for concurrency controls

afb5ed6

This commit adds design details for canceling concurrent PipelineRuns.

lbernick force-pushed the concurrency branch from 4a6a28e to afb5ed6 Compare September 29, 2022 13:55

tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 30, 2022

vdemeester approved these changes Oct 3, 2022

View reviewed changes

tekton-robot assigned afrittoli Oct 3, 2022

tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 3, 2022

tekton-robot merged commit 3c33b95 into tektoncd:main Oct 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEP-0120] Add proposal for concurrency controls #818

[TEP-0120] Add proposal for concurrency controls #818

lbernick commented Sep 9, 2022

lbernick Sep 9, 2022

lbernick commented Sep 9, 2022

lbernick commented Sep 9, 2022

vdemeester left a comment

vdemeester Sep 12, 2022

vdemeester Sep 12, 2022

vdemeester Sep 12, 2022

abayer commented Sep 12, 2022

lbernick commented Sep 12, 2022

lbernick commented Sep 12, 2022

lbernick commented Sep 15, 2022

dibyom commented Sep 15, 2022 •

edited

Loading

lbernick commented Sep 19, 2022 •

edited

Loading

lbernick commented Sep 19, 2022

lbernick commented Sep 22, 2022

lbernick commented Sep 23, 2022 •

edited

Loading

lbernick commented Sep 29, 2022

abayer commented Sep 30, 2022

vdemeester Oct 3, 2022

lbernick Oct 3, 2022

vdemeester Oct 3, 2022

tekton-robot commented Oct 3, 2022

afrittoli commented Oct 3, 2022

		This solution assumes that PipelineRuns using concurrency will typically be created by tooling such as Pipelines as Code, a Workflow, or similar,
		and would likely not have different concurrency strategies.

[TEP-0120] Add proposal for concurrency controls #818

[TEP-0120] Add proposal for concurrency controls #818

Conversation

lbernick commented Sep 9, 2022

lbernick Sep 9, 2022

Choose a reason for hiding this comment

lbernick commented Sep 9, 2022

lbernick commented Sep 9, 2022

vdemeester left a comment

Choose a reason for hiding this comment

vdemeester Sep 12, 2022

Choose a reason for hiding this comment

vdemeester Sep 12, 2022

Choose a reason for hiding this comment

vdemeester Sep 12, 2022

Choose a reason for hiding this comment

abayer commented Sep 12, 2022

lbernick commented Sep 12, 2022

lbernick commented Sep 12, 2022

lbernick commented Sep 15, 2022

dibyom commented Sep 15, 2022 • edited Loading

lbernick commented Sep 19, 2022 • edited Loading

lbernick commented Sep 19, 2022

lbernick commented Sep 22, 2022

lbernick commented Sep 23, 2022 • edited Loading

lbernick commented Sep 29, 2022

abayer commented Sep 30, 2022

vdemeester Oct 3, 2022

Choose a reason for hiding this comment

lbernick Oct 3, 2022

Choose a reason for hiding this comment

vdemeester Oct 3, 2022

Choose a reason for hiding this comment

tekton-robot commented Oct 3, 2022

afrittoli commented Oct 3, 2022

dibyom commented Sep 15, 2022 •

edited

Loading

lbernick commented Sep 19, 2022 •

edited

Loading

lbernick commented Sep 23, 2022 •

edited

Loading