-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2841][MLlib] Documentation for feature transformations #2068
Conversation
QA tests have started for PR 2068 at commit
|
QA tests have finished for PR 2068 at commit
|
copy @atalwalkar |
statistics on the samples in the training set. For example, RBF kernel of Support Vector Machines | ||
or the L1 and L2 regularized linear models typically assume that all features have unit variance | ||
and/or zero mean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too strong of a statement. Why not just say "Normalizing features to have unit variance and/or zero mean is very a common preprocessing step."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about I say
"For example, RBF kernel of Support Vector Machines
or the L1 and L2 regularized linear models typically works better when all features have unit variance
and/or zero mean."
I actually have this statement from scikit documentation.
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion sounds good to me! Thanks.
QA tests have started for PR 2068 at commit
|
QA tests have finished for PR 2068 at commit
|
@atalwalkar and @mengxr I just addressed the merge conflict. I think it's ready to merge. Thanks. |
QA tests have started for PR 2068 at commit
|
Tests timed out after a configured wait of |
Documentation for newly added feature transformations: 1. TF-IDF 2. StandardScaler 3. Normalizer Author: DB Tsai <dbtsai@alpinenow.com> Closes #2068 from dbtsai/transformer-documentation and squashes the following commits: 109f324 [DB Tsai] address feedback (cherry picked from commit 572952a) Signed-off-by: Xiangrui Meng <meng@databricks.com>
LGTM. Merged into master and branch-1.1! Thanks for helping on the documentation!! |
Documentation for newly added feature transformations: 1. TF-IDF 2. StandardScaler 3. Normalizer Author: DB Tsai <dbtsai@alpinenow.com> Closes apache#2068 from dbtsai/transformer-documentation and squashes the following commits: 109f324 [DB Tsai] address feedback
Documentation for newly added feature transformations: