Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GraphX] Improve LiveJournalPageRank example #4917

Closed
wants to merge 5 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,6 @@

package org.apache.spark.examples.graphx

import org.apache.spark.SparkContext._
import org.apache.spark._
import org.apache.spark.graphx._


/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you can remove this one? it imports implicits right?
This change loses the description of the parameter too.
This is on the border of non-trivial, but if it's merely fixing the usage note, seems OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is OK.
The description of the parameter will be output by the Analytics.Main, looks like:

JackydeMacBook-Pro:spark jackylee$ bin/run-example graphx.LiveJournalPageRank ../data/Wiki-Vote.txt
Set the number of edge partitions using --numEPart.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
JackydeMacBook-Pro:spark jackylee$ 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but, why remove the description, which already existed in the help message in this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I modified back, just remove the default value description

* Uses GraphX to run PageRank on a LiveJournal social network graph. Download the dataset from
* http://snap.stanford.edu/data/soc-LiveJournal1.html.
Expand All @@ -30,14 +25,14 @@ object LiveJournalPageRank {
def main(args: Array[String]) {
if (args.length < 1) {
System.err.println(
"Usage: LiveJournalPageRank <edge_list_file>\n" +
"Usage: LiveJournalPageRank <edge_list_file> --numEPart=<num_edge_partitions>\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--numEPart is still duplicated. It's added to this line but already shown directly below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now

" [--tol=<tolerance>]\n" +
" The tolerance allowed at convergence (smaller => more accurate). Default is " +
"0.001.\n" +
" [--output=<output_file>]\n" +
" If specified, the file to write the ranks to.\n" +
" [--numEPart=<num_edge_partitions>]\n" +
" The number of partitions for the graph's edge RDD. Default is 4.\n" +
" The number of partitions for the graph's edge RDD.\n" +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, but now you have put the option in twice! why not just move it up and remove the square braces to mark it non-optional? keep everything else the same.

" [--partStrategy=RandomVertexCut | EdgePartition1D | EdgePartition2D | " +
"CanonicalRandomVertexCut]\n" +
" The way edges are assigned to edge partitions. Default is RandomVertexCut.")
Expand Down