Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move ML lib data generator files to util/ #711

Merged
merged 3 commits into from
Jul 19, 2013

Conversation

shivaram
Copy link
Contributor

The generator classes are out of place in the regression directory --- I've moved them to util to avoid creating another directory (say data or something like that ?)

@mateiz, @etrain, @atalwalkar -- Any other ideas ?

@AmplabJenkins
Copy link

Thank you for your pull request. All automated tests for this request have passed.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/222/

@etrain
Copy link
Contributor

etrain commented Jul 17, 2013

I think this is a fine place to put this - but can we call it "LogisticRegressionDataGenerator"? or something like that?

Also, can we factor the part that produces the RDD into it's own function within the LogisticRegression[Data]Generator object? I think having Data Generation functions available externally that don't rely on writing things to disk will be extremely useful for e.g. unit tests.

Separately, I think it should be pretty easy to set up such a function with an arbitrary input distribution: maybe it could take something like this:
def generateSamples(nsamples: Long, nfeatures: Int, probOne: Double = 0.5, distFun: Random => Double = _.nextGaussian): RDD[Array[Double]]

The last part is probably overkill, though.

@shivaram
Copy link
Contributor Author

Thanks for the comments. Renamed files to DataGenerator and I also refactored the classes to have methods which can be reused.

@AmplabJenkins
Copy link

Thank you for submitting this pull request. Unfortunately, the automated tests for this request have failed.
Refer to this link for build results: http://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/228/

@shivaram
Copy link
Contributor Author

FYI - the Jenkins build failed because it couldn't reach github.com.

@etrain
Copy link
Contributor

etrain commented Jul 19, 2013

The refactored code looks good to me, thanks Shivavaram!

mateiz added a commit that referenced this pull request Jul 19, 2013
Move ML lib data generator files to util/
@mateiz mateiz merged commit c40f0f2 into mesos:master Jul 19, 2013
@mateiz
Copy link
Member

mateiz commented Jul 19, 2013

Thanks guys, merged this in.

xiajunluan pushed a commit to xiajunluan/spark that referenced this pull request May 30, 2014
Add `limit` transformation to `SchemaRDD`.

Author: Takuya UESHIN <ueshin@happy-camper.st>

Closes mesos#711 from ueshin/issues/SPARK-1778 and squashes the following commits:

33169df [Takuya UESHIN] Add 'limit' transformation to SchemaRDD.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants