[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN #4259

dbtsai · 2015-01-29T02:54:55Z

No description provided.

SparkQA · 2015-01-29T03:08:42Z

Test build #26284 has finished for PR 4259 at commit db8efff.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class LinearRegression extends Estimator[LinearRegressionModel] with LinearRegressionParams

SparkQA · 2015-01-29T03:38:17Z

Test build #26283 has finished for PR 4259 at commit 179f4b3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-29T04:38:41Z

Test build #26291 has finished for PR 4259 at commit 5b75cda.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-29T04:43:42Z

Test build #26292 has finished for PR 4259 at commit aaaaa29.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-01-29T05:17:46Z

Test build #26296 has started for PR 4259 at commit dc7dcd7.

This patch merges cleanly.

shaneknapp · 2015-01-29T06:22:28Z

i will kick this off again once i restart jenkins.

shaneknapp · 2015-01-29T06:24:18Z

jenkins, test this please.

SparkQA · 2015-01-29T07:48:34Z

Test build #26298 has finished for PR 4259 at commit dc7dcd7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-02-04T01:22:36Z

mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala

@@ -23,6 +23,27 @@ private[ml] trait HasRegParam extends Params {
  def getRegParam: Double = get(regParam)
 }

+private[ml] trait HasElasticNetParam extends HasRegParam {
+  /** param for elastic net regularization parameter */
+  val alphaParam: DoubleParam = new DoubleParam(


I would move it to LinearRegression and rename it to alpha. This is not a shared param.

LOR will have it eventually as well. Do you mean we should with HasAlpha with HasRegParam instead of with HasElasticNetParam?

I'd vote for not using HasRegParam at all and using elastic net-specific terminology such as l1RegParam, l2RegParam or something else which makes it clear which param applies to which part of the regularization.

mengxr · 2015-02-04T01:28:15Z

@dbtsai Thanks for contributing elastic-net! It may be hard to merge this into 1.3 now. There are two pending 1.3 features that are relevant: DataFrame API and Prediction APIs, both of which will conflict with this PR. We may want to wait for them merged first and update this PR to avoid resolve conflicts frequently.

dbtsai · 2015-02-05T02:55:52Z

I just rebased and addressed the api changes.

I was thinking about that when we do standardization on the feature, we're actually rescaling on the data which is basically rescaling on the covariant position of the equation. However, we can achieve the same scaling on rescaling on the graidentSum which is contravariant position. Thus, we don't need to apply the scaler on the data which will be much cheaper. This also works for the logistic regression as well. For intercept in LOR, we can deal with it in gradient function instead of using applyBias, so combining both of those two techniques, we don't have to create new dataset.

SparkQA · 2015-02-05T04:02:20Z

Test build #26812 has finished for PR 4259 at commit 9066e4c.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- class LinearRegression extends Estimator[LinearRegressionModel] with LinearRegressionParams

jkbradley · 2015-02-23T18:23:12Z

@dbtsai I'd like to make a pass over this, but I realized that it has conflicts because of the developer api PR committed last week: [https://github.com//pull/3637] Could you please rebase? I don't think there are any more big PRs coming up which will make you rebase again. Thank you!

debasish83 · 2015-03-01T01:06:19Z

@dbtsai we are close to merge this PR which brings OWLQN and PQN under the umbrella of proximal algorithms to support most of the interesting ML related constraints scalanlp/breeze#364

I also have another Breeze PR that merges OWLQN and PQN through Proximal Quasi Newton method but that will take time to stabilize..

I would like to add more tests for OWLQN and specifically comparisons with liblinear...Most likely if I find some interesting testcases in your PR, I will move them to NonlinearMinimizerTest...

I will open up a mllib PR that uses NonlinearMinimizer and tests a family of consensus algorithms (Block Coordinate Descent and ADMM)..Our main focus is ElasticNet as well so it will be great if this PR is merged so that I can build upon it...

debasish83 · 2015-03-01T01:15:10Z

this is linear regression...what happened to the logistic regression elastic net? We are more interested in that one...

dbtsai · 2015-03-03T03:26:45Z

@jkbradley I will rebase soon. @debasish83 I'll add MLOR with elastic-net when we stabilize the new ML api. Doing this in old codebase will be huge effort, and I will like to make them self-contain as much as possible instead of sharing the same generalized linear algorithm base class. We already have enough if else statement to deal with the differences in the base class.

SparkQA · 2015-03-24T01:14:06Z

Test build #29042 has finished for PR 4259 at commit 66a4dc3.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2015-03-24T01:14:40Z

@jkbradley and @mengxr I just rebased it. Will do couple optimizations to avoid the scaling on the datasets which can be done in the optimization process instead. You guys can start to give me feedback so we have ample time to address issues before 1.4. Thanks.

SparkQA · 2015-03-24T02:45:52Z

Test build #29043 has finished for PR 4259 at commit ea3e1dc.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-03-24T03:01:36Z

Test build #29044 has finished for PR 4259 at commit ec3572f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-03-24T20:43:19Z

mllib/src/main/scala/org/apache/spark/ml/param/sharedParams.scala

@@ -34,6 +34,43 @@ private[ml] trait HasRegParam extends Params {
  def getRegParam: Double = get(regParam)
 }

+private[ml] trait HasElasticNetParam extends HasRegParam {


Do we want to call this "alphaParam?" That name assumes people have read elastic net papers. What about something like:

"elasticNetParam" (very explicit) (I assume this was your original name.)

"regMixing" (also explicit)

"regAlpha" (at least will be grouped next to "regParam")

I voted for regAlphaParam: Double for the variable name, and keep HasElasticNetParam as the trait name. What do you think?

I think the names should definitely match, so we should pick one. I'm OK with "elasticNetParam" or "regAlpha." (I think "regAlpha" doesn't need "Param" attached to it since "alpha" is the name of the parameter.) Since "elastic net" is more easily recognized than "alpha," I vote for "elasticNetParam."

Sounds great! I'm convinced to do something like

private[ml] trait HasElasticNetParam extends HasRegParam { /** * param for elastic net regularization parameter * @group param */ val elasticNetParam: DoubleParam = new DoubleParam(this, "elasticNetParam", "the ElasticNet mixing parameter") /** @group getParam */ def getElasticNetParam: Double = get(elasticNetParam) }

Nice, I like that it extends HasRegParam. When you update it, can you please make the doc specify more about the parameter (range, plus which end of the range corresponds to L1 vs. L2)?

Definitely. It seems that there is no easy way to specify the valid parameter range or requirement in this framework. Do you think it's a good idea that I add the check like the following?

def getElasticNetParam: Double = { val elasticNetParam = get(elasticNetParam) require(elasticNetParam >= 0) require(elasticNetParam <= 1) elasticNetParam }

jkbradley · 2015-03-24T20:50:13Z

@dbtsai Thanks for the update! I'll make a more detailed pass soon.

One bigger question: What is the plan for extensions? Based on this PR, it sounds like we expect "LinearRegression" to only support no regularization, L1, L2, or elastic net. I assume that, in the future, users who want other regularization can write their own extensions of GeneralizedLinearAlgorithm (once that is in spark.ml). But would we ever want to add more built-in regularizers? Or are we deciding now that we never will?

dbtsai · 2015-03-26T21:51:21Z

@jkbradley I think we should only support basic regularization in spark.ml first which is what python scikit-learn does. If users have the need of different type of regularization, they can implement it based on the code we have.

It will be hard to implement GeneralizedLinearAlgorithm with regularization without using a lot of if-else statement to handle the special case. I implemented logistic regression, linear regression, and cox proportional-hazards regression with elasticnet regularization at Alpine, and our customers are asking for precise accuracy compared with R's glmnet package. As a result, I spent some time to research the original R's glmnet code, and I found that there is no generic way to handle different linear models. There are special cases here and there.

For example, in logistic regression, the intercept is computed by adding extra one dimension in the data with constant one, but in linear regression, the intercept is computed by val intercept = yMean - dot(weights, scaler.mean).

As a result, I would like to implement them separately and make sure we have the same accuracy compared with R with proper tests first, and then we can abstract out the common part. I have another PR trying to do this, #1518 and I will continuous on that after this PR is merged.

jkbradley · 2015-03-27T21:15:26Z

@dbtsai That sounds reasonable. I think we could do a general implementation based on abstractions for regularization and loss functions, but it might not be as efficient as specializations (such as closed-form computation of the intercept for linear regression). I'm OK with this for now, and I definitely support comparisons with R for correctness.

Let me know when it's all ready so I can make a detailed pass!

SparkQA · 2015-03-27T23:19:16Z

Test build #29327 has finished for PR 4259 at commit 14fd603.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-04-10T02:04:51Z

Test build #29990 has finished for PR 4259 at commit ea8d1a4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.
This patch does not change any dependencies.

SparkQA · 2015-04-18T00:36:57Z

Test build #30508 has finished for PR 4259 at commit 825ca02.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.
This patch removes the following dependencies:
- RoaringBitmap-0.4.5.jar
- activation-1.1.jar
- akka-actor_2.10-2.3.4-spark.jar
- akka-remote_2.10-2.3.4-spark.jar
- akka-slf4j_2.10-2.3.4-spark.jar
- aopalliance-1.0.jar
- arpack_combined_all-0.1.jar
- avro-1.7.7.jar
- breeze-macros_2.10-0.11.2.jar
- breeze_2.10-0.11.2.jar
- chill-java-0.5.0.jar
- chill_2.10-0.5.0.jar
- commons-beanutils-1.7.0.jar
- commons-beanutils-core-1.8.0.jar
- commons-cli-1.2.jar
- commons-codec-1.10.jar
- commons-collections-3.2.1.jar
- commons-compress-1.4.1.jar
- commons-configuration-1.6.jar
- commons-digester-1.8.jar
- commons-httpclient-3.1.jar
- commons-io-2.1.jar
- commons-lang-2.5.jar
- commons-lang3-3.3.2.jar
- commons-math-2.1.jar
- commons-math3-3.4.1.jar
- commons-net-2.2.jar
- compress-lzf-1.0.0.jar
- config-1.2.1.jar
- core-1.1.2.jar
- curator-client-2.4.0.jar
- curator-framework-2.4.0.jar
- curator-recipes-2.4.0.jar
- gmbal-api-only-3.0.0-b023.jar
- grizzly-framework-2.1.2.jar
- grizzly-http-2.1.2.jar
- grizzly-http-server-2.1.2.jar
- grizzly-http-servlet-2.1.2.jar
- grizzly-rcm-2.1.2.jar
- groovy-all-2.3.7.jar
- guava-14.0.1.jar
- guice-3.0.jar
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-client-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- hadoop-mapreduce-client-app-2.2.0.jar
- hadoop-mapreduce-client-common-2.2.0.jar
- hadoop-mapreduce-client-core-2.2.0.jar
- hadoop-mapreduce-client-jobclient-2.2.0.jar
- hadoop-mapreduce-client-shuffle-2.2.0.jar
- hadoop-yarn-api-2.2.0.jar
- hadoop-yarn-client-2.2.0.jar
- hadoop-yarn-common-2.2.0.jar
- hadoop-yarn-server-common-2.2.0.jar
- ivy-2.4.0.jar
- jackson-annotations-2.4.0.jar
- jackson-core-2.4.4.jar
- jackson-core-asl-1.8.8.jar
- jackson-databind-2.4.4.jar
- jackson-jaxrs-1.8.8.jar
- jackson-mapper-asl-1.8.8.jar
- jackson-module-scala_2.10-2.4.4.jar
- jackson-xc-1.8.8.jar
- jansi-1.4.jar
- javax.inject-1.jar
- javax.servlet-3.0.0.v201112011016.jar
- javax.servlet-3.1.jar
- javax.servlet-api-3.0.1.jar
- jaxb-api-2.2.2.jar
- jaxb-impl-2.2.3-1.jar
- jcl-over-slf4j-1.7.10.jar
- jersey-client-1.9.jar
- jersey-core-1.9.jar
- jersey-grizzly2-1.9.jar
- jersey-guice-1.9.jar
- jersey-json-1.9.jar
- jersey-server-1.9.jar
- jersey-test-framework-core-1.9.jar
- jersey-test-framework-grizzly2-1.9.jar
- jets3t-0.7.1.jar
- jettison-1.1.jar
- jetty-util-6.1.26.jar
- jline-0.9.94.jar
- jline-2.10.4.jar
- jodd-core-3.6.3.jar
- json4s-ast_2.10-3.2.10.jar
- json4s-core_2.10-3.2.10.jar
- json4s-jackson_2.10-3.2.10.jar
- jsr305-1.3.9.jar
- jtransforms-2.4.0.jar
- jul-to-slf4j-1.7.10.jar
- kryo-2.21.jar
- log4j-1.2.17.jar
- lz4-1.2.0.jar
- management-api-3.0.0-b012.jar
- mesos-0.21.0-shaded-protobuf.jar
- metrics-core-3.1.0.jar
- metrics-graphite-3.1.0.jar
- metrics-json-3.1.0.jar
- metrics-jvm-3.1.0.jar
- minlog-1.2.jar
- netty-3.8.0.Final.jar
- netty-all-4.0.23.Final.jar
- objenesis-1.2.jar
- opencsv-2.3.jar
- oro-2.0.8.jar
- paranamer-2.6.jar
- parquet-column-1.6.0rc3.jar
- parquet-common-1.6.0rc3.jar
- parquet-encoding-1.6.0rc3.jar
- parquet-format-2.2.0-rc1.jar
- parquet-generator-1.6.0rc3.jar
- parquet-hadoop-1.6.0rc3.jar
- parquet-jackson-1.6.0rc3.jar
- protobuf-java-2.4.1.jar
- protobuf-java-2.5.0-spark.jar
- py4j-0.8.2.1.jar
- pyrolite-2.0.1.jar
- quasiquotes_2.10-2.0.1.jar
- reflectasm-1.07-shaded.jar
- scala-compiler-2.10.4.jar
- scala-library-2.10.4.jar
- scala-reflect-2.10.4.jar
- scalap-2.10.4.jar
- scalatest_2.10-2.2.1.jar
- slf4j-api-1.7.10.jar
- slf4j-log4j12-1.7.10.jar
- snappy-java-1.1.1.7.jar
- spark-bagel_2.10-1.4.0-SNAPSHOT.jar
- spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
- spark-core_2.10-1.4.0-SNAPSHOT.jar
- spark-graphx_2.10-1.4.0-SNAPSHOT.jar
- spark-launcher_2.10-1.4.0-SNAPSHOT.jar
- spark-mllib_2.10-1.4.0-SNAPSHOT.jar
- spark-network-common_2.10-1.4.0-SNAPSHOT.jar
- spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
- spark-repl_2.10-1.4.0-SNAPSHOT.jar
- spark-sql_2.10-1.4.0-SNAPSHOT.jar
- spark-streaming_2.10-1.4.0-SNAPSHOT.jar
- spire-macros_2.10-0.7.4.jar
- spire_2.10-0.7.4.jar
- stax-api-1.0.1.jar
- stream-2.7.0.jar
- tachyon-0.5.0.jar
- tachyon-client-0.5.0.jar
- uncommons-maths-1.2.2a.jar
- unused-1.0.0.jar
- xmlenc-0.52.jar
- xz-1.0.jar
- zookeeper-3.4.5.jar

SparkQA · 2015-04-18T01:20:42Z

Test build #30510 has finished for PR 4259 at commit c867838.

This patch fails Scala style tests.
This patch does not merge cleanly.
This patch adds no public classes.
This patch removes the following dependencies:
- RoaringBitmap-0.4.5.jar
- activation-1.1.jar
- akka-actor_2.10-2.3.4-spark.jar
- akka-remote_2.10-2.3.4-spark.jar
- akka-slf4j_2.10-2.3.4-spark.jar
- aopalliance-1.0.jar
- arpack_combined_all-0.1.jar
- avro-1.7.7.jar
- breeze-macros_2.10-0.11.2.jar
- breeze_2.10-0.11.2.jar
- chill-java-0.5.0.jar
- chill_2.10-0.5.0.jar
- commons-beanutils-1.7.0.jar
- commons-beanutils-core-1.8.0.jar
- commons-cli-1.2.jar
- commons-codec-1.10.jar
- commons-collections-3.2.1.jar
- commons-compress-1.4.1.jar
- commons-configuration-1.6.jar
- commons-digester-1.8.jar
- commons-httpclient-3.1.jar
- commons-io-2.1.jar
- commons-lang-2.5.jar
- commons-lang3-3.3.2.jar
- commons-math-2.1.jar
- commons-math3-3.4.1.jar
- commons-net-2.2.jar
- compress-lzf-1.0.0.jar
- config-1.2.1.jar
- core-1.1.2.jar
- curator-client-2.4.0.jar
- curator-framework-2.4.0.jar
- curator-recipes-2.4.0.jar
- gmbal-api-only-3.0.0-b023.jar
- grizzly-framework-2.1.2.jar
- grizzly-http-2.1.2.jar
- grizzly-http-server-2.1.2.jar
- grizzly-http-servlet-2.1.2.jar
- grizzly-rcm-2.1.2.jar
- groovy-all-2.3.7.jar
- guava-14.0.1.jar
- guice-3.0.jar
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-client-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- hadoop-mapreduce-client-app-2.2.0.jar
- hadoop-mapreduce-client-common-2.2.0.jar
- hadoop-mapreduce-client-core-2.2.0.jar
- hadoop-mapreduce-client-jobclient-2.2.0.jar
- hadoop-mapreduce-client-shuffle-2.2.0.jar
- hadoop-yarn-api-2.2.0.jar
- hadoop-yarn-client-2.2.0.jar
- hadoop-yarn-common-2.2.0.jar
- hadoop-yarn-server-common-2.2.0.jar
- ivy-2.4.0.jar
- jackson-annotations-2.4.0.jar
- jackson-core-2.4.4.jar
- jackson-core-asl-1.8.8.jar
- jackson-databind-2.4.4.jar
- jackson-jaxrs-1.8.8.jar
- jackson-mapper-asl-1.8.8.jar
- jackson-module-scala_2.10-2.4.4.jar
- jackson-xc-1.8.8.jar
- jansi-1.4.jar
- javax.inject-1.jar
- javax.servlet-3.0.0.v201112011016.jar
- javax.servlet-3.1.jar
- javax.servlet-api-3.0.1.jar
- jaxb-api-2.2.2.jar
- jaxb-impl-2.2.3-1.jar
- jcl-over-slf4j-1.7.10.jar
- jersey-client-1.9.jar
- jersey-core-1.9.jar
- jersey-grizzly2-1.9.jar
- jersey-guice-1.9.jar
- jersey-json-1.9.jar
- jersey-server-1.9.jar
- jersey-test-framework-core-1.9.jar
- jersey-test-framework-grizzly2-1.9.jar
- jets3t-0.7.1.jar
- jettison-1.1.jar
- jetty-util-6.1.26.jar
- jline-0.9.94.jar
- jline-2.10.4.jar
- jodd-core-3.6.3.jar
- json4s-ast_2.10-3.2.10.jar
- json4s-core_2.10-3.2.10.jar
- json4s-jackson_2.10-3.2.10.jar
- jsr305-1.3.9.jar
- jtransforms-2.4.0.jar
- jul-to-slf4j-1.7.10.jar
- kryo-2.21.jar
- log4j-1.2.17.jar
- lz4-1.2.0.jar
- management-api-3.0.0-b012.jar
- mesos-0.21.0-shaded-protobuf.jar
- metrics-core-3.1.0.jar
- metrics-graphite-3.1.0.jar
- metrics-json-3.1.0.jar
- metrics-jvm-3.1.0.jar
- minlog-1.2.jar
- netty-3.8.0.Final.jar
- netty-all-4.0.23.Final.jar
- objenesis-1.2.jar
- opencsv-2.3.jar
- oro-2.0.8.jar
- paranamer-2.6.jar
- parquet-column-1.6.0rc3.jar
- parquet-common-1.6.0rc3.jar
- parquet-encoding-1.6.0rc3.jar
- parquet-format-2.2.0-rc1.jar
- parquet-generator-1.6.0rc3.jar
- parquet-hadoop-1.6.0rc3.jar
- parquet-jackson-1.6.0rc3.jar
- protobuf-java-2.4.1.jar
- protobuf-java-2.5.0-spark.jar
- py4j-0.8.2.1.jar
- pyrolite-2.0.1.jar
- quasiquotes_2.10-2.0.1.jar
- reflectasm-1.07-shaded.jar
- scala-compiler-2.10.4.jar
- scala-library-2.10.4.jar
- scala-reflect-2.10.4.jar
- scalap-2.10.4.jar
- scalatest_2.10-2.2.1.jar
- slf4j-api-1.7.10.jar
- slf4j-log4j12-1.7.10.jar
- snappy-java-1.1.1.7.jar
- spark-bagel_2.10-1.4.0-SNAPSHOT.jar
- spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
- spark-core_2.10-1.4.0-SNAPSHOT.jar
- spark-graphx_2.10-1.4.0-SNAPSHOT.jar
- spark-launcher_2.10-1.4.0-SNAPSHOT.jar
- spark-mllib_2.10-1.4.0-SNAPSHOT.jar
- spark-network-common_2.10-1.4.0-SNAPSHOT.jar
- spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
- spark-repl_2.10-1.4.0-SNAPSHOT.jar
- spark-sql_2.10-1.4.0-SNAPSHOT.jar
- spark-streaming_2.10-1.4.0-SNAPSHOT.jar
- spire-macros_2.10-0.7.4.jar
- spire_2.10-0.7.4.jar
- stax-api-1.0.1.jar
- stream-2.7.0.jar
- tachyon-0.5.0.jar
- tachyon-client-0.5.0.jar
- uncommons-maths-1.2.2a.jar
- unused-1.0.0.jar
- xmlenc-0.52.jar
- xz-1.0.jar
- zookeeper-3.4.5.jar

mengxr · 2015-04-24T22:36:13Z

mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala

+ * Trait for shared param elasticNetParam.
+ */
+@DeveloperApi
+trait HasElasticNetParam extends Params {


This shouldn't be a shared param.

do you suggest to move it to LinearRegression.scala? we will use it in LOR as well.

SparkQA · 2015-04-28T02:49:53Z

Test build #31090 has finished for PR 4259 at commit b46679c.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds the following public classes (experimental):
- trait HasElasticNetParam extends Params
- trait HasTol extends Params
This patch removes the following dependencies:
- RoaringBitmap-0.4.5.jar
- activation-1.1.jar
- akka-actor_2.10-2.3.4-spark.jar
- akka-remote_2.10-2.3.4-spark.jar
- akka-slf4j_2.10-2.3.4-spark.jar
- aopalliance-1.0.jar
- arpack_combined_all-0.1.jar
- avro-1.7.7.jar
- breeze-macros_2.10-0.11.2.jar
- breeze_2.10-0.11.2.jar
- chill-java-0.5.0.jar
- chill_2.10-0.5.0.jar
- commons-beanutils-1.7.0.jar
- commons-beanutils-core-1.8.0.jar
- commons-cli-1.2.jar
- commons-codec-1.10.jar
- commons-collections-3.2.1.jar
- commons-compress-1.4.1.jar
- commons-configuration-1.6.jar
- commons-digester-1.8.jar
- commons-httpclient-3.1.jar
- commons-io-2.1.jar
- commons-lang-2.5.jar
- commons-lang3-3.3.2.jar
- commons-math-2.1.jar
- commons-math3-3.4.1.jar
- commons-net-2.2.jar
- compress-lzf-1.0.0.jar
- config-1.2.1.jar
- core-1.1.2.jar
- curator-client-2.4.0.jar
- curator-framework-2.4.0.jar
- curator-recipes-2.4.0.jar
- gmbal-api-only-3.0.0-b023.jar
- grizzly-framework-2.1.2.jar
- grizzly-http-2.1.2.jar
- grizzly-http-server-2.1.2.jar
- grizzly-http-servlet-2.1.2.jar
- grizzly-rcm-2.1.2.jar
- groovy-all-2.3.7.jar
- guava-14.0.1.jar
- guice-3.0.jar
- hadoop-annotations-2.2.0.jar
- hadoop-auth-2.2.0.jar
- hadoop-client-2.2.0.jar
- hadoop-common-2.2.0.jar
- hadoop-hdfs-2.2.0.jar
- hadoop-mapreduce-client-app-2.2.0.jar
- hadoop-mapreduce-client-common-2.2.0.jar
- hadoop-mapreduce-client-core-2.2.0.jar
- hadoop-mapreduce-client-jobclient-2.2.0.jar
- hadoop-mapreduce-client-shuffle-2.2.0.jar
- hadoop-yarn-api-2.2.0.jar
- hadoop-yarn-client-2.2.0.jar
- hadoop-yarn-common-2.2.0.jar
- hadoop-yarn-server-common-2.2.0.jar
- ivy-2.4.0.jar
- jackson-annotations-2.4.0.jar
- jackson-core-2.4.4.jar
- jackson-core-asl-1.8.8.jar
- jackson-databind-2.4.4.jar
- jackson-jaxrs-1.8.8.jar
- jackson-mapper-asl-1.8.8.jar
- jackson-module-scala_2.10-2.4.4.jar
- jackson-xc-1.8.8.jar
- jansi-1.4.jar
- javax.inject-1.jar
- javax.servlet-3.0.0.v201112011016.jar
- javax.servlet-3.1.jar
- javax.servlet-api-3.0.1.jar
- jaxb-api-2.2.2.jar
- jaxb-impl-2.2.3-1.jar
- jcl-over-slf4j-1.7.10.jar
- jersey-client-1.9.jar
- jersey-core-1.9.jar
- jersey-grizzly2-1.9.jar
- jersey-guice-1.9.jar
- jersey-json-1.9.jar
- jersey-server-1.9.jar
- jersey-test-framework-core-1.9.jar
- jersey-test-framework-grizzly2-1.9.jar
- jets3t-0.7.1.jar
- jettison-1.1.jar
- jetty-util-6.1.26.jar
- jline-0.9.94.jar
- jline-2.10.4.jar
- jodd-core-3.6.3.jar
- json4s-ast_2.10-3.2.10.jar
- json4s-core_2.10-3.2.10.jar
- json4s-jackson_2.10-3.2.10.jar
- jsr305-1.3.9.jar
- jtransforms-2.4.0.jar
- jul-to-slf4j-1.7.10.jar
- kryo-2.21.jar
- log4j-1.2.17.jar
- lz4-1.2.0.jar
- management-api-3.0.0-b012.jar
- mesos-0.21.0-shaded-protobuf.jar
- metrics-core-3.1.0.jar
- metrics-graphite-3.1.0.jar
- metrics-json-3.1.0.jar
- metrics-jvm-3.1.0.jar
- minlog-1.2.jar
- netty-3.8.0.Final.jar
- netty-all-4.0.23.Final.jar
- objenesis-1.2.jar
- opencsv-2.3.jar
- oro-2.0.8.jar
- paranamer-2.6.jar
- parquet-column-1.6.0rc3.jar
- parquet-common-1.6.0rc3.jar
- parquet-encoding-1.6.0rc3.jar
- parquet-format-2.2.0-rc1.jar
- parquet-generator-1.6.0rc3.jar
- parquet-hadoop-1.6.0rc3.jar
- parquet-jackson-1.6.0rc3.jar
- protobuf-java-2.4.1.jar
- protobuf-java-2.5.0-spark.jar
- py4j-0.8.2.1.jar
- pyrolite-2.0.1.jar
- quasiquotes_2.10-2.0.1.jar
- reflectasm-1.07-shaded.jar
- scala-compiler-2.10.4.jar
- scala-library-2.10.4.jar
- scala-reflect-2.10.4.jar
- scalap-2.10.4.jar
- scalatest_2.10-2.2.1.jar
- slf4j-api-1.7.10.jar
- slf4j-log4j12-1.7.10.jar
- snappy-java-1.1.1.7.jar
- spark-bagel_2.10-1.4.0-SNAPSHOT.jar
- spark-catalyst_2.10-1.4.0-SNAPSHOT.jar
- spark-core_2.10-1.4.0-SNAPSHOT.jar
- spark-graphx_2.10-1.4.0-SNAPSHOT.jar
- spark-launcher_2.10-1.4.0-SNAPSHOT.jar
- spark-mllib_2.10-1.4.0-SNAPSHOT.jar
- spark-network-common_2.10-1.4.0-SNAPSHOT.jar
- spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar
- spark-repl_2.10-1.4.0-SNAPSHOT.jar
- spark-sql_2.10-1.4.0-SNAPSHOT.jar
- spark-streaming_2.10-1.4.0-SNAPSHOT.jar
- spire-macros_2.10-0.7.4.jar
- spire_2.10-0.7.4.jar
- stax-api-1.0.1.jar
- stream-2.7.0.jar
- tachyon-0.6.4.jar
- tachyon-client-0.6.4.jar
- uncommons-maths-1.2.2a.jar
- unused-1.0.0.jar
- xmlenc-0.52.jar
- xz-1.0.jar
- zookeeper-3.4.5.jar

dbtsai · 2015-04-28T06:55:31Z

mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala

+    }
+
+    axpy(-diffSum, correction, result)
+    scal(1.0 / totalCnt, result)


Okay, I finally found why correction effect is zero. It's because diffSum is zero in our test dataset. diffSum is sum of diff, and for a synthetic dataset generated from linear equation with noise, the average of diff will be zero. As a result, for a real non-linear dataset, diffSum will not be zero, so we need some non-linear dataset for testing correctness. I'll add famous prostate cancer dataset used in the linear regression lasso paper into the unit-test.

…ther synthetic dataset which can catch the bug fixed in this commit.

SparkQA · 2015-04-28T07:40:50Z

Test build #31130 has finished for PR 4259 at commit 9fc48ed.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait HasElasticNetParam extends Params
- trait HasTol extends Params
This patch does not change any dependencies.

SparkQA · 2015-04-28T09:44:11Z

Test build #31133 has finished for PR 4259 at commit a81c201.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- trait HasElasticNetParam extends Params
- trait HasTol extends Params
This patch does not change any dependencies.

mengxr · 2015-04-28T16:46:33Z

LGTM. Merged into master. Thanks!!

Author: DB Tsai <dbt@netflix.com> Author: DB Tsai <dbtsai@alpinenow.com> Closes apache#4259 from dbtsai/lir and squashes the following commits: a81c201 [DB Tsai] add import org.apache.spark.util.Utils back 9fc48ed [DB Tsai] rebase 2178b63 [DB Tsai] add comments 9988ca8 [DB Tsai] addressed feedback and fixed a bug. TODO: documentation and build another synthetic dataset which can catch the bug fixed in this commit. fcbaefe [DB Tsai] Refactoring 4eb078d [DB Tsai] first commit

mengxr reviewed Feb 4, 2015
View reviewed changes

jkbradley reviewed Mar 24, 2015
View reviewed changes

mengxr reviewed Apr 24, 2015
View reviewed changes

dbtsai mentioned this pull request Apr 28, 2015

[MLLib]SPARK-6348:Enable useFeatureScaling in SVMWithSGD #5055

Closed

first commit

4eb078d

dbtsai reviewed Apr 28, 2015
View reviewed changes

DB Tsai and others added 4 commits April 28, 2015 00:11

Refactoring

fcbaefe

addressed feedback and fixed a bug. TODO: documentation and build ano…

9988ca8

…ther synthetic dataset which can catch the bug fixed in this commit.

add comments

2178b63

rebase

9fc48ed

add import org.apache.spark.util.Utils back

a81c201

asfgit closed this in 6a827d5 Apr 28, 2015

[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN #4259

[SPARK-5253] [ML] LinearRegression with L1/L2 (ElasticNet) using OWLQN #4259

Conversation

dbtsai commented Jan 29, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

SparkQA commented Jan 29, 2015

shaneknapp commented Jan 29, 2015

shaneknapp commented Jan 29, 2015

SparkQA commented Jan 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mengxr commented Feb 4, 2015

dbtsai commented Feb 5, 2015

SparkQA commented Feb 5, 2015

jkbradley commented Feb 23, 2015

debasish83 commented Mar 1, 2015

debasish83 commented Mar 1, 2015

dbtsai commented Mar 3, 2015

SparkQA commented Mar 24, 2015

dbtsai commented Mar 24, 2015

SparkQA commented Mar 24, 2015

SparkQA commented Mar 24, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkbradley commented Mar 24, 2015

dbtsai commented Mar 26, 2015

jkbradley commented Mar 27, 2015

SparkQA commented Mar 27, 2015

SparkQA commented Apr 10, 2015

SparkQA commented Apr 18, 2015

SparkQA commented Apr 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Apr 28, 2015

Choose a reason for hiding this comment

SparkQA commented Apr 28, 2015

SparkQA commented Apr 28, 2015

mengxr commented Apr 28, 2015