Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A set of shuffle map output related changes #587

Merged
merged 11 commits into from
May 4, 2013
Merged

Conversation

rxin
Copy link
Member

@rxin rxin commented Apr 30, 2013

  1. Added a BlockObjectWriter interface for writing out a series of jvm objects to disk.
  2. Added a getBlockWriter interface to DiskStore.
  3. Added a new method to BlockManager for writing out shuffle files. This method provides a short-circuited way to write shuffle data out directly to disk.
  4. Added ShuffleBlockManager and updated ShuffleMapTask to use the manager to write map outputs out to disk (ultimately via the BlockObjectWriter interface).
  5. Allow specifying a shuffle serilaizer on a per-shuffle basis.

@AmplabJenkins
Copy link

All tests passed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20/

private[spark]
class ShuffleBlockManager(blockManager: BlockManager) {

val shuffles = new ConcurrentHashMap[Int, Shuffle]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map doesn't appear to be used in this class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah let me remove it. It was for a separate design that I decided to pull out last minute.

@AmplabJenkins
Copy link

Tests failed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21/

@rxin
Copy link
Member Author

rxin commented May 1, 2013

I think the above jenkin test failure was due to a hang (fixed in #586, already merged)not caused by this pull request.

* class name. If a previous instance of the serializer object has been created, the get
* method returns that instead of creating a new one.
*/
object Serializer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid singleton objects so that we can eventually run multiple instances of Spark in the same JVM (e.g. for multithreaded test execution). Can you either make these methods be part of SparkEnv, or possibly add a "serializer manager" in SparkEnv?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@mateiz
Copy link
Member

mateiz commented May 3, 2013

Hey Reynold,

This looks good except for the singleton object and the semantics of revertPartialWrites on disk. For the latter, do you want to keep this interface to support merging shuffle files in the future? If so, we should at least document that it can cause previously committed writes to be deleted, but it might also be good to just make it resize the file to right before it started writing stuff.

@mateiz
Copy link
Member

mateiz commented May 3, 2013

By the way, this is how you can truncate the file: http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/channels/FileChannel.html#truncate%28long%29. You can probably use FileOutputStream.getChannel.truncate() for example.

@rxin
Copy link
Member Author

rxin commented May 3, 2013

Done. PTAL.

I actually added the append feature fully back to DiskStore (although the higher level shuffle block manager doesn't yet do consolidation).

@AmplabJenkins
Copy link

Tests failed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27/

@rxin
Copy link
Member Author

rxin commented May 3, 2013

Again - don't think the above test failures were caused by this pr. It is a little bit annoying that Jenkins just keeps saying PRs are failing tests ...

@mateiz
Copy link
Member

mateiz commented May 3, 2013

Yeah, we should probably disable that for now.

mateiz added a commit that referenced this pull request May 4, 2013
A set of shuffle map output related changes
@mateiz mateiz merged commit 2484ad7 into mesos:master May 4, 2013
@mateiz
Copy link
Member

mateiz commented May 4, 2013

Thanks Reynold, I've merged this.

andyk pushed a commit to andyk/mesos-spark that referenced this pull request May 5, 2014
Args for worker rather than master

Author: Chen Chao <crazyjvm@gmail.com>

Closes mesos#587 from CrazyJvm/patch-6 and squashes the following commits:

b54b89f [Chen Chao] Args for worker rather than master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants