SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

srowen · 2014-05-05T22:22:36Z

While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs.

Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown.

AmplabJenkins · 2014-05-05T22:22:58Z

Merged build triggered.

AmplabJenkins · 2014-05-05T22:23:03Z

Merged build started.

srowen · 2014-05-05T22:23:12Z

docs/mllib-naive-bayes.md

+import org.apache.spark.mllib.linalg.Vectors
+import org.apache.spark.mllib.regression.LabeledPoint
+
+val data = sc.textFile("mllib/data/sample_naive_bayes_data.txt")


PS here I fleshed out the example since the project contains a data file for Naive bayes.

AmplabJenkins · 2014-05-05T22:58:01Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T22:58:01Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14683/

mengxr · 2014-05-05T23:53:28Z

docs/configuration.md

-             .setAppName("My application")
-             .set("spark.executor.memory", "1g")
+val conf = new SparkConf().
+             setMaster("local").


Shall we ask users to use :paste instead of putting . at the end of the line?

Yeah the problem is that it didn't work as-is since the first line can be interpreted as a complete statement. I figured that it's best if the snippets work as given, without additional commands or config. I reviewed most of the other Scala snippets in the docs here, and there were only a few cases like this.

This is not for running in the REPL, where we don't really need to create SparkConf. With the latest spark-submit, this line should change to new SparkConf().setAppName("My application").

Ah right, this snippet won't be pasted into a REPL, true. (I think the other case I saw is for the REPL, so should have this kind of change.)

But you're saying it can be simplified anyway to that one line? I can change that but I wonder if the idea is to just show use of setters? if so I could revert the change... or just leave for consistency with the other REPL-friendly snippet?

In v1.0, the recommended way of launching an app is through spark-submit, where you set Spark configurations through command-line arguments. It is easier to switch masters than hard code setMaster("local"). Also, it works for YARN.

Agree, although on re-reading I think the purpose of this snippet is to explain how one would invoke Spark programmatically via SparkConf (or else the whole thing should go away). It is something you might want to do in a Scala program, and might even want to pop into a Scala REPL (i.e. not spark-shell). I suggest leaving it; am I really off-base on that?

If we remove setMaster and set("spark.executor.memory" ...), then it fits in a single line. Those properties should be set with spark-submit.

Hey @mengxr we want to allow these properties to be set directly as well... some applications will use this.

I think it might be good here to show both. First say that they can be set directly and then explain that they can be set through arguments to spark-submit if you create a SparkContext with an empty conf.

./bin/spark-submit --name "My application" --master local --executor-memory 1g

…se format needed a tweak)

…down (but only those that do not affect kramdown's output)

…f sum and count

AmplabJenkins · 2014-05-06T11:27:57Z

Merged build triggered.

AmplabJenkins · 2014-05-06T11:28:04Z

Merged build started.

AmplabJenkins · 2014-05-06T12:04:07Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-06T12:04:08Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14714/

mengxr · 2014-05-07T00:25:21Z

@pwendell Could you help merge this PR? Then I can fix minor issues mentioned in the discussion.

pwendell · 2014-05-07T03:06:23Z

Okay I can merge this and then we can take the other discussion in a new PR/JIRA.

…in (primarly) MLlib docs While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs. Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown. Author: Sean Owen <sowen@cloudera.com> Closes #653 from srowen/SPARK-1727 and squashes the following commits: 6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count 8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output) 99966a9 [Sean Owen] Update issue tracker URL in docs 23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak) 8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs (cherry picked from commit 25ad8f9) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

@srowen

…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after #653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes #670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans

@srowen

…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after #653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes #670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans (cherry picked from commit d52761d) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

…in (primarly) MLlib docs While play-testing the Scala and Java code examples in the MLlib docs, I noticed a number of small compile errors, and some typos. This led to finding and fixing a few similar items in other docs. Then in the course of building the site docs to check the result, I found a few small suggestions for the build instructions. I also found a few more formatting and markdown issues uncovered when I accidentally used maruku instead of kramdown. Author: Sean Owen <sowen@cloudera.com> Closes apache#653 from srowen/SPARK-1727 and squashes the following commits: 6e7c38a [Sean Owen] Final doc updates - one more compile error, and use of mean instead of sum and count 8f5e847 [Sean Owen] Fix markdown syntax issues that maruku flags, even though we use kramdown (but only those that do not affect kramdown's output) 99966a9 [Sean Owen] Update issue tracker URL in docs 23c9ac3 [Sean Owen] Add Scala Naive Bayes example, to use existing example data file (whose format needed a tweak) 8c81982 [Sean Owen] Fix small compile errors and typos across MLlib docs

@srowen

…cationModel, and KMeans `model.predict` returns a RDD of Scala primitive type (Int/Double), which is recognized as Object in Java. Adding predict(JavaRDD) could make life easier for Java users. Added tests for KMeans, LinearRegression, and NaiveBayes. Will update examples after apache#653 gets merged. cc: @srowen Author: Xiangrui Meng <meng@databricks.com> Closes apache#670 from mengxr/predict-javardd and squashes the following commits: b77ccd8 [Xiangrui Meng] Merge branch 'master' into predict-javardd 43caac9 [Xiangrui Meng] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans

srowen reviewed May 5, 2014
View reviewed changes

mengxr reviewed May 5, 2014
View reviewed changes

srowen added 5 commits May 6, 2014 09:34

Fix small compile errors and typos across MLlib docs

8c81982

Add Scala Naive Bayes example, to use existing example data file (who…

23c9ac3

…se format needed a tweak)

Update issue tracker URL in docs

99966a9

Fix markdown syntax issues that maruku flags, even though we use kram…

8f5e847

…down (but only those that do not affect kramdown's output)

Final doc updates - one more compile error, and use of mean instead o…

6e7c38a

…f sum and count

mengxr mentioned this pull request May 6, 2014

[SPARK-1741][MLLIB] add predict(JavaRDD) to RegressionModel, ClassificationModel, and KMeans #670

Closed

asfgit closed this in 25ad8f9 May 7, 2014

srowen deleted the SPARK-1727 branch May 7, 2014 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

srowen commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

srowen May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

mengxr May 5, 2014

srowen May 6, 2014

mengxr May 6, 2014

srowen May 6, 2014

mengxr May 6, 2014

srowen May 6, 2014

mengxr May 6, 2014

pwendell May 6, 2014

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

mengxr commented May 7, 2014

pwendell commented May 7, 2014

SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

SPARK-1727. Correct small compile errors, typos, and markdown issues in (primarly) MLlib docs #653

Conversation

srowen commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

Choose a reason for hiding this comment

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

AmplabJenkins commented May 6, 2014

mengxr commented May 7, 2014

pwendell commented May 7, 2014