-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653
Conversation
Merged build triggered. |
Merged build started. |
import org.apache.spark.mllib.linalg.Vectors | ||
import org.apache.spark.mllib.regression.LabeledPoint | ||
|
||
val data = sc.textFile("mllib/data/sample_naive_bayes_data.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PS here I fleshed out the example since the project contains a data file for Naive bayes.
Merged build finished. All automated tests passed. |
All automated tests passed. |
.setAppName("My application") | ||
.set("spark.executor.memory", "1g") | ||
val conf = new SparkConf(). | ||
setMaster("local"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we ask users to use :paste
instead of putting .
at the end of the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the problem is that it didn't work as-is since the first line can be interpreted as a complete statement. I figured that it's best if the snippets work as given, without additional commands or config. I reviewed most of the other Scala snippets in the docs here, and there were only a few cases like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not for running in the REPL, where we don't really need to create SparkConf
. With the latest spark-submit
, this line should change to new SparkConf().setAppName("My application")
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, this snippet won't be pasted into a REPL, true. (I think the other case I saw is for the REPL, so should have this kind of change.)
But you're saying it can be simplified anyway to that one line? I can change that but I wonder if the idea is to just show use of setters? if so I could revert the change... or just leave for consistency with the other REPL-friendly snippet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In v1.0, the recommended way of launching an app is through spark-submit
, where you set Spark configurations through command-line arguments. It is easier to switch masters than hard code setMaster("local")
. Also, it works for YARN.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, although on re-reading I think the purpose of this snippet is to explain how one would invoke Spark programmatically via SparkConf
(or else the whole thing should go away). It is something you might want to do in a Scala program, and might even want to pop into a Scala REPL (i.e. not spark-shell
). I suggest leaving it; am I really off-base on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove setMaster
and set("spark.executor.memory" ...)
, then it fits in a single line. Those properties should be set with spark-submit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @mengxr we want to allow these properties to be set directly as well... some applications will use this.
I think it might be good here to show both. First say that they can be set directly and then explain that they can be set through arguments to spark-submit if you create a SparkContext with an empty conf.
./bin/spark-submit --name "My application" --master local --executor-memory 1g
…se format needed a tweak)
…down (but only those that do not affect kramdown's output)
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@pwendell Could you help merge this PR? Then I can fix minor issues mentioned in the discussion. |
Okay I can merge this and then we can take the other discussion in a new PR/JIRA. |
…in (primarly) MLlib docs While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs. Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown. Author: Sean Owen <sowen@cloudera.com> Closes #653 from srowen/SPARK-1727 and squashes the following commits: 6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count 8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output) 99966a9 [Sean Owen] Update issue tracker URL in docs 23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak) 8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs (cherry picked from commit 25ad8f9) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after #653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes #670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after #653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes #670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans (cherry picked from commit d52761d) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
…in (primarly) MLlib docs While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs. Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown. Author: Sean Owen <sowen@cloudera.com> Closes apache#653 from srowen/SPARK-1727 and squashes the following commits: 6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count 8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output) 99966a9 [Sean Owen] Update issue tracker URL in docs 23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak) 8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs
…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after apache#653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes apache#670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans
While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.
Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.