-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A set of shuffle map output related changes #587
Conversation
rxin
commented
Apr 30, 2013
- Added a BlockObjectWriter interface for writing out a series of jvm objects to disk.
- Added a getBlockWriter interface to DiskStore.
- Added a new method to BlockManager for writing out shuffle files. This method provides a short-circuited way to write shuffle data out directly to disk.
- Added ShuffleBlockManager and updated ShuffleMapTask to use the manager to write map outputs out to disk (ultimately via the BlockObjectWriter interface).
- Allow specifying a shuffle serilaizer on a per-shuffle basis.
doesn't need to build up an array buffer for each shuffle bucket.
local block reads).
Conflicts: core/src/main/scala/spark/storage/DiskStore.scala
consolidate shuffle output files.
size is 8KB in FastBufferedOutputStream, which is too small and would cause a lot of disk seeks.
Conflicts: core/src/main/scala/spark/BlockStoreShuffleFetcher.scala
All tests passed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/. |
private[spark] | ||
class ShuffleBlockManager(blockManager: BlockManager) { | ||
|
||
val shuffles = new ConcurrentHashMap[Int, Shuffle] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This map doesn't appear to be used in this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah let me remove it. It was for a separate design that I decided to pull out last minute.
Tests failed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/. |
I think the above jenkin test failure was due to a hang (fixed in #586, already merged)not caused by this pull request. |
* class name. If a previous instance of the serializer object has been created, the get | ||
* method returns that instead of creating a new one. | ||
*/ | ||
object Serializer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid singleton objects so that we can eventually run multiple instances of Spark in the same JVM (e.g. for multithreaded test execution). Can you either make these methods be part of SparkEnv, or possibly add a "serializer manager" in SparkEnv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Hey Reynold, This looks good except for the singleton object and the semantics of revertPartialWrites on disk. For the latter, do you want to keep this interface to support merging shuffle files in the future? If so, we should at least document that it can cause previously committed writes to be deleted, but it might also be good to just make it resize the file to right before it started writing stuff. |
By the way, this is how you can truncate the file: http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/channels/FileChannel.html#truncate%28long%29. You can probably use FileOutputStream.getChannel.truncate() for example. |
Done. PTAL. I actually added the append feature fully back to DiskStore (although the higher level shuffle block manager doesn't yet do consolidation). |
Tests failed for this pull request. For more details, visit http://amplab.cs.berkeley.edu/jenkins/. |
Again - don't think the above test failures were caused by this pr. It is a little bit annoying that Jenkins just keeps saying PRs are failing tests ... |
Yeah, we should probably disable that for now. |
A set of shuffle map output related changes
Thanks Reynold, I've merged this. |
Args for worker rather than master Author: Chen Chao <crazyjvm@gmail.com> Closes mesos#587 from CrazyJvm/patch-6 and squashes the following commits: b54b89f [Chen Chao] Args for worker rather than master