Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Support dsl.ParallelFor over list of Artifacts #10437

Closed
wants to merge 6 commits into from

Conversation

KevinGrantLee
Copy link
Contributor

@KevinGrantLee KevinGrantLee commented Jan 27, 2024

Description of your changes:

This PR adds support over dsl.ParallelFor over tasks that output lists of artifacts.

ex.

@dsl.component
def make_artifacts(...) -> List[Artifact]
    ...

@dsl.pipeline
def my_pipeline:
    make_artifacts_task = make_artifacts()
    with dsl.ParallelFor(items=make_artifacts_task.output) as item:
        print_artifact_name(var_artifact=item)

This PR does not support dsl.ParallelFor over a raw list of Artifacts.

Checklist:

that are output by previous components.
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign connor-mccarthy for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@KevinGrantLee KevinGrantLee changed the title Support dsl.ParallelFor over list of Artifacts feat(sdk): Support dsl.ParallelFor over list of Artifacts Jan 27, 2024
Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @KevinGrantLee! This is a great feature.

sdk/python/kfp/dsl/types/type_annotations.py Show resolved Hide resolved
return LoopArgument(
items=channel,
name_override=channel.name + '-' + cls.LOOP_ITEM_NAME_BASE,
task_name=channel.task_name,
channel_type=_get_loop_item_type(channel.channel_type) or 'String',
is_artifact_list=is_artifact_list,
value=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented out lines 176 and 193 and the test still passed. Do we need an additional test case for this?

sdk/python/kfp/dsl/for_loop.py Show resolved Hide resolved
sdk/python/kfp/dsl/for_loop.py Show resolved Hide resolved
@@ -64,7 +73,7 @@ def _get_subvar_type(type_name: str) -> Optional[str]:
return match['value_type'].lstrip().rstrip() if match else None


class LoopArgument(pipeline_channel.PipelineParameterChannel):
class LoopArgument(pipeline_channel.PipelineChannel):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these implementations be simpler (even if longer) if we split LoopArgument into class LoopArgumentParameter(pipeline_channel.PipelineParameterChannel) and class LoopArgumentArtifact(pipeline_channel.PipelineArtifactChannel)?

I think this would:
(a) Mirror the logical organization of PipelineChannels
(b) Simplify the interfaces of LoopArgumentParameter and LoopArgumentArtifact compared to LoopArgument. No mixing of parameter and artifact concepts (as is the case with the is_artifact_list and value parameters).
(c) Eliminate changes needed in pipeline_channel.py
(d) Allow us to piggyback on more of the existing compilation logic in pipeline_spec_builder.py

I do agree with the intent of the current changes: a uniform mechanism for passing data, irrespective of data type, would be very nice to have. I think the complexity costs of this are somewhat high, however, since the rest of the KFP SDK internals don't follow this pattern beyond having a shared PipelineChannel ABC. As you've noticed, PipelineChannel and its subclasses don't have a great polymorphic relationship: PipelineChannel subclasses disregard the constructor of the PipelineChannel ABC, the Liskov Substitution Principle does not hold, etc. It's best to think of PipelineChannel as a form of "sentinel" parent class that merely informs code that an object that subclasses it is one of our several pipeline channel instances.

If we want to fix this pre-existing issue, I think it would make sense to address the inconsistency starting within pipeline_channel.py, though that's probably a bit ambitious for this one feature.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made the changes in the other PR, yes I agree with (a) and (b) - the code is a little simpler by making the distinction between LoopParameterArgumentand LoopArtifactArgument.

(c) and (d) are roughly the same

Copy link
Member

@connor-mccarthy connor-mccarthy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also update the release notes to let others know about this feature?

Copy link

@KevinGrantLee: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
kubeflow-pipelines-sdk-isort d1701ec link true /test kubeflow-pipelines-sdk-isort
kubeflow-pipelines-sdk-yapf d1701ec link true /test kubeflow-pipelines-sdk-yapf

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@KevinGrantLee
Copy link
Contributor Author

KevinGrantLee commented Jan 29, 2024

Please ignore this pull request - made another at #10441

Leaving this open temporarily to preserve the review comments and will address comments on other PR. Will cancel this PR after.

@KevinGrantLee KevinGrantLee marked this pull request as draft January 29, 2024 22:12
@KevinGrantLee KevinGrantLee deleted the parallelfor-artifact branch February 15, 2024 23:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants