-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default task number misleading in several places #766
Conversation
<code> private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } </code> it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
this uses Spark's default number of parallel tasks (2 for local machine, 8 for a cluster) to | ||
do the grouping. You can pass an optional <code>numTasks</code> argument to set a different | ||
number of tasks.</td> | ||
this uses Spark's default number of parallel tasks (local mode is 2, while cluster mode is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about "2 for local mode, and in cluster mode the number is determined by ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's good i think : )
re-organized words
Merged build triggered. |
Merged build started. |
Thanks. I've merged this. |
private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to #389 Author: Chen Chao <crazyjvm@gmail.com> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places (cherry picked from commit 2f63995) Signed-off-by: Reynold Xin <rxin@apache.org>
Merged build finished. All automated tests passed. |
All automated tests passed. |
private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389 Author: Chen Chao <crazyjvm@gmail.com> Closes apache#766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places
private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
new HashPartitioner(numPartitions)
}
it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism
the property "spark.default.parallelism" refers to #389