Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle #30163

Closed
wants to merge 8 commits into from

Conversation

zhouyejoe
Copy link
Contributor

@zhouyejoe zhouyejoe commented Oct 27, 2020

What changes were proposed in this pull request?

This is one of the patches for SPIP SPARK-30602 which is needed for push-based shuffle.
Summary of changes:
This PR introduces a new RPC to be called within Driver. When the expected shuffle push wait time reaches, Driver will call this RPC to facilitate coordination of shuffle map/reduce stages and notify external shuffle services to finalize shuffle block merge for a given shuffle. Shuffle services also respond back the metadata about a merged shuffle partition back to the caller.

Why are the changes needed?

Refer to the SPIP in SPARK-30602.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This code snippets won't be called by any existing code and will be tested after the coordinated driver changes gets merged in SPARK-32920.

Lead-authored-by: Min Shen mshen@linkedin.com

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@mridulm mridulm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A minor comment. Rest looks good to me.

Copy link
Contributor

@attilapiros attilapiros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very hard (even impossible) to reason about code changes which are not used (as well as not even tested) so please consider cutting the push based shuffle feature as whole into PRs which covers a bit more meaningful parts on its own.

@Victsm
Copy link
Contributor

Victsm commented Nov 16, 2020

@attilapiros thanks for that feedback, and we acknowledge the sentiment for the issues with small patches in a large feature.
This is an oversight on my end when initially breaking SPARK-30602 into sub-tasks.
I thought SPARK-32918 could be a reasonable sized patch but it turned out to be not the case.
Both SPARK-32918 and SPARK-32919 are meant to be getting some of the prerequisites in place before getting to SPARK-32920 and SPARK-32921, which are relative major changes to key Spark driver components i.e. DAGScheduler and MapOutputTracker.
The original intent was to make patches for SPARK-32920 and SPARK-32921 focusing on changes to these corresponding components only, given the potentially vigorous reviews we would receive for changing these components.

@mridulm
Copy link
Contributor

mridulm commented Nov 20, 2020

@tgravescs, @attilapiros, @Ngone51 are there any other comments ?
Thanks.

@tgravescs
Copy link
Contributor

kicked the builds again as the error should have been fixed by 6da8ade

@mridulm
Copy link
Contributor

mridulm commented Nov 23, 2020

Thanks for the +1 @tgravescs ! The scala 2.13 error is unrelated.
Merging to master.

@asfgit asfgit closed this in 1bd897c Nov 23, 2020
@mridulm
Copy link
Contributor

mridulm commented Nov 23, 2020

Thanks for working on this @zhouyejoe !
And thanks for the reviews @tgravescs, @attilapiros, @Ngone51, @otterc and @Victsm !

@zhouyejoe
Copy link
Contributor Author

Thanks everyone on reviewing this PR.

wangyum pushed a commit that referenced this pull request May 26, 2023
…ordination for push-based shuffle

### What changes were proposed in this pull request?
This is one of the patches for SPIP SPARK-30602 which is needed for push-based shuffle.
Summary of changes:
This PR introduces a new RPC to be called within Driver. When the expected shuffle push wait time reaches, Driver will call this RPC to facilitate coordination of shuffle map/reduce stages and notify external shuffle services to finalize shuffle block merge for a given shuffle. Shuffle services also respond back the metadata about a merged shuffle partition back to the caller.

### Why are the changes needed?
Refer to the SPIP in SPARK-30602.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
This code snippets won't be called by any existing code and will be tested after the coordinated driver changes gets merged in SPARK-32920.

Lead-authored-by: Min Shen mshenlinkedin.com

Closes #30163 from zhouyejoe/SPARK-32918.

Lead-authored-by: Ye Zhou <yezhou@linkedin.com>
Co-authored-by: Min Shen <mshen@linkedin.com>
Signed-off-by: Mridul Muralidharan <mridul<at>gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants