-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tweaks to the SparkLR example #872
base: master
Are you sure you want to change the base?
Conversation
…tions. Also add an example of Kryo usage, and an optional "points" parameter.
Thank you for your pull request. An admin will review this request soon. |
Jenkins, this is ok to test |
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/764/ |
kryo.setRegistrationRequired(true) | ||
|
||
kryo.register(classOf[scala.collection.mutable.WrappedArray.ofRef[_]]) | ||
kryo.register(classOf[java.lang.Class[_]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Mike, why do you need to register WrappedArray and Class? Doesn't seem like they'll occur in our data here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with that list by adding setRegistrationRequired
, then fixing each exception that was thrown. So Kryo said they were used; I haven't considered why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Did Kryo actually make a performance difference for you at all? We are not caching data in serialized form here, so it would only be used to send back task results, but that's just one Vector per partition. I think the WrappedArrays are because we somehow send that back within an array.
Replace fold() with aggregate() in SparkLR, to avoid Vector instatiations.
Thank you for submitting this pull request. All automated tests for this request have passed. Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/809/ |
(1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y * p.x | ||
}.reduce(_ + _) | ||
((1 / (1 + exp(-p.y * (w dot p.x))) - 1) * p.y, p.x) | ||
}.aggregate(zero)((sum,v) => sum saxpy (v._1,v._2), _ += _) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the second argument to sum.saxpy(v._1, v._2)
; we don't use infix notation for methods unless they're operators.
Sorry for the long delay, just made a couple more comments. I'm happy to merge this otherwise. You can either update it here or send a new PR against the apache/incubator-spark repo (the latter would be slightly easier). |
This sets the max line length to 100 as a PEP8 exception. Author: Reynold Xin <rxin@apache.org> Closes mesos#872 from rxin/pep8 and squashes the following commits: 2f26029 [Reynold Xin] Added PEP8 style configuration file.
I've been using
spark.examples.SparkLR
for performance testing. Just passing along some enhancements to the class.