Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2841][MLlib] Documentation for feature transformations #2068

Closed
wants to merge 1 commit into from
Closed

[SPARK-2841][MLlib] Documentation for feature transformations #2068

wants to merge 1 commit into from

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Aug 20, 2014

Documentation for newly added feature transformations:

  1. TF-IDF
  2. StandardScaler
  3. Normalizer

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have started for PR 2068 at commit e339f64.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 20, 2014

QA tests have finished for PR 2068 at commit e339f64.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • shift # Ignore main class (org.apache.spark.deploy.SparkSubmit) and use our own

@mengxr
Copy link
Contributor

mengxr commented Aug 20, 2014

copy @atalwalkar

statistics on the samples in the training set. For example, RBF kernel of Support Vector Machines
or the L1 and L2 regularized linear models typically assume that all features have unit variance
and/or zero mean.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too strong of a statement. Why not just say "Normalizing features to have unit variance and/or zero mean is very a common preprocessing step."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I say
"For example, RBF kernel of Support Vector Machines
or the L1 and L2 regularized linear models typically works better when all features have unit variance
and/or zero mean."

I actually have this statement from scikit documentation.
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion sounds good to me! Thanks.

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have started for PR 2068 at commit 0a8fd34.

  • This patch does not merge cleanly!

@SparkQA
Copy link

SparkQA commented Aug 21, 2014

QA tests have finished for PR 2068 at commit 0a8fd34.

  • This patch fails unit tests.
  • This patch does not merge cleanly!

@dbtsai
Copy link
Member Author

dbtsai commented Aug 23, 2014

@atalwalkar and @mengxr I just addressed the merge conflict. I think it's ready to merge. Thanks.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

QA tests have started for PR 2068 at commit 109f324.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Aug 23, 2014

Tests timed out after a configured wait of 120m.

asfgit pushed a commit that referenced this pull request Aug 25, 2014
Documentation for newly added feature transformations:
1. TF-IDF
2. StandardScaler
3. Normalizer

Author: DB Tsai <dbtsai@alpinenow.com>

Closes #2068 from dbtsai/transformer-documentation and squashes the following commits:

109f324 [DB Tsai] address feedback

(cherry picked from commit 572952a)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@mengxr
Copy link
Contributor

mengxr commented Aug 25, 2014

LGTM. Merged into master and branch-1.1! Thanks for helping on the documentation!!

@asfgit asfgit closed this in 572952a Aug 25, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Documentation for newly added feature transformations:
1. TF-IDF
2. StandardScaler
3. Normalizer

Author: DB Tsai <dbtsai@alpinenow.com>

Closes apache#2068 from dbtsai/transformer-documentation and squashes the following commits:

109f324 [DB Tsai] address feedback
@dbtsai dbtsai deleted the transformer-documentation branch October 28, 2014 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants