Proposal: Support task-level parallelism #71
Labels
area/workloads
Related to workload execution (e.g. jobs, tasks)
component/execution
Issues or PRs related exclusively to the Execution component (Job, JobConfig)
kind/proposal
Proposal for new ideas or features
Milestone
Motivation
Users may wish to shard their periodic jobs into multiple Pods. For example every day at 12am, we will need to process a whole bunch of work, and this work to be done may significantly increase in volume over time, and the work cannot be done before 12am (i.e. on the previous day). At where we stand right now, the only option is to support vertical scaling of a Pod, which is obviously impractical beyond a certain point. As such, we want to evaluate how to support horizontal scaling of Job pods (i.e. task-level parallelism).
The Job object in K8s currently supports parallel execution: https://kubernetes.io/docs/concepts/workloads/controllers/job/#parallel-jobs. However, it is my personal opinion that the K8s JobController API for controlling parallelism of multiple Pods is not very clear and well-designed. We will outline the use cases, and attempt to propose an API design that would support and potentially improve the existing one in K8s.
It is also important to avoid over-designing this feature. A good principle to keep in mind is that Furiko's main focus on automatic timed/periodic tasks, not so much user-invoked or complex task workflows, and we are better off utilizing more complex workflow engines (i.e. Argo Workflows) to support them.
Use Cases
We will outline some use cases (including/highlighting those we have gotten internally):
I personally don't think there is a need for "fixed completion count" jobs like in K8s, at least I have never encountered a use case which depends on this. Perhaps the close equivalent of "fixed completion count" is to start N Jobs at the same time with a fixed concurrency factor, which is slightly different from the topic we are currently discussing.
Requirements
batch/v1
JobController, it depends on a Pod's restartPolicy to retry containers, but we explicitly support pod-level retries.Success
) is sufficient to indicate a successful Job, but it becomes less clear once we have parallel Pods.Proposed Implementation 1
We are not going forward with this proposal.
Show old proposal...
We will implement a new CRD
ShardedSet
(NOTE: the name is currently TBC). This CRD is most similar to a ReplicaSet, except that it controls running to completion (which ironically actually makes it similar tobatch/v1
Job itself).The implementation of a ShardedSet will follow closely to the Indexed Job pattern in
batch/v1
(https://kubernetes.io/docs/tasks/job/indexed-parallel-processing-static/), but defines completion policy in a much more explicit manner than is currently supported by thebatch/v1
Job API, and avoids the confusion with having to definecompletions
. See kubernetes/kubernetes#28486 for some related discussion about how completions are handled.CRD Design
Example of a proposed
ShardedSet
custom resource object, with all possible API fields (including future expansion):Complete breakdown for
completion.condition
success cases:Complete breakdown for
completion.condition
failure cases:Note that:
OnAllShardsSuccess
==OnAnyShardFailure
OnAllShardsSuccess
==OnAnyShardFailure
Therefore, we can simplify it to just
AllShardsSucceeded
andAnyShardSucceeded
. (Help me verify this claim??)Inside a JobConfig, the user will define it like this:
Pros and Cons
batch/v1
Job.Proposed Implementation 2
Another way is to avoid the use of a CRD, but implement it directly in the JobController and update the Job API.
We will add the following fields to the spec:
Some possible parallelism types:
withCount
: Specify absolute number, the index number will be generated from0 - N-1
in${task.index_num}
withKeys
: Specify a list of string keys, it will be made available in${task.index_key}
withMatrix
: Specify a map of keys to a list of string values, each key will be available in${task.index_matrix.<matrix_key>}
. This is to support common parallel patterns (e.g. CI on multiple platform and version combinations)Some considerations:
withCount
of3
, each index (0
,1
,2
) has a maximum of 3 retries each.completionStrategy
is similar to Proposal (1).The main reason we are not using Proposal (1) is because of the complexity introduced when having nested retries, and it is actually clearer to inline the implementation into the same CRD/controller.
Alternatives
There are some alternatives to the above design to achieve the same requirements.
TODO List
The text was updated successfully, but these errors were encountered: