[Feature Request] Add parallelism for dsl.ParallelFor #4089

hlu09 · 2020-06-26T17:57:08Z

Use Case

Large data mining job is often split into many small jobs. Given the limit shared resource of external services (e.g., DataFlow), we can only run a few small jobs simultaneously.

Global parallelism works to some degree but lack flexibility, e.g., with global parallelism at 1, any in-cluster task can block launching jobs to external services.

Argo supports template-level parallelism
https://github.com/argoproj/argo/blob/master/examples/parallelism-nested-workflow.yaml#L19
https://github.com/argoproj/argo/blob/master/examples/parallelism-nested-dag.yaml#L15

Feature request

with dsl.ParallelFor(loopidy_doop, parallelism=2) as item:
    // DAG in loop here

There will be at most 2 loop-DAG running in parallel.

Bobgy · 2020-06-28T06:35:20Z

/assign @Ark-kun
/area sdk/dsl

NikeNano · 2020-06-29T14:30:44Z

I would be happy to help out as well!
/assign

Ark-kun · 2020-06-29T18:43:29Z

This is an interesting feature request. It's not hard to implement, but I wonder whether the parallelism control is common in orchestrators.
It looks like in Argo the parallelism option can be applied to any DAG. I wonder whether we should do the same and make the max_parallel_executions the property of the OpsGroup.

hlu09 · 2020-06-29T19:57:16Z

More generic way might be

    with dsl.Parallelism(2):
        with dsl.ParallelFor(loopidy_doop) as item:

Such block can be applied to any DAG outside parallel-for.

hlu09 · 2020-06-29T20:14:11Z

The tricky part is to design a way to define the unit of max_parallel_executions/parallelism. The argo yaml below will enforce parallelism limit at task level, rather than sub-DAG level (in this case, 2-step sequential DAG under ParallelFor).

  - name: for-loop-for-loop-0535d69b-1
    parallelism: 2
    inputs:
      parameters:
      - {name: loopidy_doop-loop-item-subvar-a}
    dag:
      tasks:
      - name: my-in-cop1
        template: my-in-cop1
        dependencies: [sleep-10-seconds]
        arguments:
          parameters:
          - {name: loopidy_doop-loop-item-subvar-a, value: '{{inputs.parameters.loopidy_doop-loop-item-subvar-a}}'}
      - {name: sleep-10-seconds, template: sleep-10-seconds}

        with dsl.ParallelFor(loopidy_doop, parallelism=2) as item:
            sleep = sleep_op(10)
            op1 = dsl.ContainerOp(
                name="my-in-cop1",
                image="library/bash",
                command=["sh", "-c"],
                arguments=["echo no output global op1, item.a: %s" % item.a],
            ).after(sleep)

NikeNano · 2020-06-30T19:52:42Z

The tricky part is to design a way to define the unit of max_parallel_executions/parallelism. The argo yaml below will enforce parallelism limit at task level, rather than sub-DAG level (in this case, 2-step sequential DAG under ParallelFor).

What do you mean by task levelvs ´sub-DAG` level? Do you mean that the for each DAG within the for loop there should be a limit not on the task it self? @hlu09?

I wonder whether we should do the same and make the max_parallel_executions the property of the OpsGroup. I think it makes sense for any ops that share resources to have the option and thus set it in the OpsGroup @Ark-kun.

hlu09 · 2020-06-30T21:18:58Z

Suppose the ParallelFor above generates 100 sub-DAGs, each contains 2 ops: sleep_op followed echo.

One way to enforce parallelism (2) is: 2 of these 100 sub-DAGs are executed first, followed by the next 2, and so on.

A different way: put 100 sleep_ops and 100 echo ops together in a group, 2 of these 200 ops are executed first, followed by the next 2, and so on. Certainly the echo.after(seelp_op) is still enforced for a single sub-DAG.

NikeNano · 2020-07-01T08:33:16Z

Thanks for the clarification @hlu09. I as I understand you suggest to keep the parallelism on the sub-DAGs level? Thus allowing X nbr of sub-DAG:s to run in parallel. I also think this makes the most sense from a users perspective.

NikeNano · 2020-07-01T09:17:38Z

I think this example use parallelism as you suggest @hlu09 : https://github.com/argoproj/argo/blob/master/examples/parallelism-nested-dag.yaml

hlu09 · 2020-07-01T17:04:06Z

@NikeNano right, it makes sense to keep the parallelism on the sub-DAGs level, since ParallelFor generates many sub-DAGs.

NikeNano · 2020-07-06T09:37:33Z

Will start the work this work today/tomorrow so we get it rolling!

…low#4149) * Added parallism at sub-dag level * updated the parallism * remove yaml file * reformatting * Update sdk/python/kfp/compiler/compiler.py * Update sdk/python/kfp/compiler/compiler.py * Update samples/core/loop_parallelism/loop_parallelism.py Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com> Co-authored-by: Alexey Volkov <alexey.volkov@ark-kun.com>

k8s-ci-robot assigned Ark-kun Jun 28, 2020

k8s-ci-robot added the area/sdk/dsl label Jun 28, 2020

Bobgy added kind/feature status/triaged Whether the issue has been explicitly triaged labels Jun 28, 2020

k8s-ci-robot assigned NikeNano Jun 29, 2020

NikeNano mentioned this issue Jul 6, 2020

feat: add parallelism for dsl.ParallelFor. Fixes #4089 #4149

Merged

2 tasks

k8s-ci-robot closed this as completed in c6ac83f Jul 8, 2020

NikeNano mentioned this issue Jul 10, 2020

feat(compiler): add dsl operation for parallelism on sub dag level #4199

Merged

2 tasks

mikwieczorek mentioned this issue Mar 23, 2022

[feature] Access to ParallelFor values and set_paralellism per Op #7454

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Add parallelism for dsl.ParallelFor #4089

[Feature Request] Add parallelism for dsl.ParallelFor #4089

hlu09 commented Jun 26, 2020

Bobgy commented Jun 28, 2020

NikeNano commented Jun 29, 2020

Ark-kun commented Jun 29, 2020

hlu09 commented Jun 29, 2020

hlu09 commented Jun 29, 2020 •

edited

Loading

NikeNano commented Jun 30, 2020

hlu09 commented Jun 30, 2020

NikeNano commented Jul 1, 2020

NikeNano commented Jul 1, 2020

hlu09 commented Jul 1, 2020

NikeNano commented Jul 6, 2020

[Feature Request] Add parallelism for dsl.ParallelFor #4089

[Feature Request] Add parallelism for dsl.ParallelFor #4089

Comments

hlu09 commented Jun 26, 2020

Bobgy commented Jun 28, 2020

NikeNano commented Jun 29, 2020

Ark-kun commented Jun 29, 2020

hlu09 commented Jun 29, 2020

hlu09 commented Jun 29, 2020 • edited Loading

NikeNano commented Jun 30, 2020

hlu09 commented Jun 30, 2020

NikeNano commented Jul 1, 2020

NikeNano commented Jul 1, 2020

hlu09 commented Jul 1, 2020

NikeNano commented Jul 6, 2020

hlu09 commented Jun 29, 2020 •

edited

Loading