SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103 #67

markhamstra · 2015-07-22T22:43:02Z

No description provided.

…imal instead of using toDouble JIRA: https://issues.apache.org/jira/browse/SPARK-8052 Author: Liang-Chi Hsieh <viirya@gmail.com> Closes apache#6645 from viirya/cast_string_integraltype and squashes the following commits: e19c6a3 [Liang-Chi Hsieh] For comment. c3e472a [Liang-Chi Hsieh] Add test. 7ced9b0 [Liang-Chi Hsieh] Use java.math.BigDecimal for casting String to Decimal instead of using toDouble. (cherry picked from commit ddec452) Signed-off-by: Reynold Xin <rxin@databricks.com>

JIRA: https://issues.apache.org/jira/browse/SPARK-9101 Author: Mateusz Buśkiewicz <mateusz.buskiewicz@getbase.com> Closes apache#7499 from sixers/spark-9101 and squashes the following commits: dd75aa6 [Mateusz Buśkiewicz] [SPARK-9101] [PySpark] Test for selecting null literal 97e3f2f [Mateusz Buśkiewicz] [SPARK-9101] [PySpark] Add missing NullType to _atomic_types in pyspark.sql.types (cherry picked from commit 02181fb) Signed-off-by: Reynold Xin <rxin@databricks.com>

…tests Several places in the PySpark SparseVector docs have one defined as: ``` SparseVector(4, [2, 4], [1.0, 2.0]) ``` The index 4 goes out of bounds (but this is not checked). CC: mengxr Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#7541 from jkbradley/sparsevec-doc-typo-fix and squashes the following commits: c806a65 [Joseph K. Bradley] fixed doc test e2dcb23 [Joseph K. Bradley] Fixed typo in pyspark sparsevector doc tests (cherry picked from commit a5d0581) Signed-off-by: Xiangrui Meng <meng@databricks.com>

… and beta!=1 Fix BLAS.gemm to update matrix C when alpha==0 and beta!=1 Also include unit tests to verify the fix. mengxr brkyvz Author: Meihua Wu <meihuawu@umich.edu> Closes apache#7503 from rotationsymmetry/fix_BLAS_gemm and squashes the following commits: fce199c [Meihua Wu] Fix BLAS.gemm to update C when alpha==0 and beta!=1 (cherry picked from commit ff3c72d) Signed-off-by: Xiangrui Meng <meng@databricks.com>

Now, when some executors are killed by dynamic-allocation, it leads to some mis-assignment onto lost executors sometimes. Such kind of mis-assignment causes task failure(s) or even job failure if it repeats that errors for 4 times. The root cause is that ***killExecutors*** doesn't remove those executors under killing ASAP. It depends on the ***OnDisassociated*** event to refresh the active working list later. The delay time really depends on your cluster status (from several milliseconds to sub-minute). When new tasks to be scheduled during that period of time, it will be assigned to those "active" but "under killing" executors. Then the tasks will be failed due to "executor lost". The better way is to exclude those executors under killing in the makeOffers(). Then all those tasks won't be allocated onto those executors "to be lost" any more. Author: Grace <jie.huang@intel.com> Closes apache#7528 from GraceH/AssignToLostExecutor and squashes the following commits: ecc1da6 [Grace] scala style fix 6e2ed96 [Grace] Re-word makeOffers by more readable lines b5546ce [Grace] Add comments about the fix 30a9ad0 [Grace] Avoid assigning tasks to lost executors (cherry picked from commit 6592a60) Signed-off-by: Imran Rashid <irashid@cloudera.com> Conflicts: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala

…el.predictSoft variants for a single vector This PR adds GaussianMixtureModel.predict & GaussianMixtureModel.predictSoft variants for a single vector which are useful when applying the trained model in environments where spark context is not required (or not desired) and predictions are made for single data points (vectors). Test case included. Author: Dariusz Kobylarz <darek.kobylarz@gmail.com> Closes apache#6906 from dkobylarz/branch-1.4 and squashes the following commits: cef1f0a [Dariusz Kobylarz] [SPARK-8481] [MLlib] GaussianMixtureModel predict accepting single vector

… attempts for a stage https://issues.apache.org/jira/browse/SPARK-8103 cc kayousterhout (thanks for the extra test case) Author: Imran Rashid <irashid@cloudera.com> Author: Kay Ousterhout <kayousterhout@gmail.com> Author: Imran Rashid <squito@users.noreply.github.com> Closes apache#6750 from squito/SPARK-8103 and squashes the following commits: fb3acfc [Imran Rashid] fix log msg e01b7aa [Imran Rashid] fix some comments, style 584acd4 [Imran Rashid] simplify going from taskId to taskSetMgr e43ac25 [Imran Rashid] Merge branch 'master' into SPARK-8103 6bc23af [Imran Rashid] update log msg 4470fa1 [Imran Rashid] rename c04707e [Imran Rashid] style 88b61cc [Imran Rashid] add tests to make sure that TaskSchedulerImpl schedules correctly with zombie attempts d7f1ef2 [Imran Rashid] get rid of activeTaskSets a21c8b5 [Imran Rashid] Merge branch 'master' into SPARK-8103 906d626 [Imran Rashid] fix merge 109900e [Imran Rashid] Merge branch 'master' into SPARK-8103 c0d4d90 [Imran Rashid] Revert "Index active task sets by stage Id rather than by task set id" f025154 [Imran Rashid] Merge pull request #2 from kayousterhout/imran_SPARK-8103 baf46e1 [Kay Ousterhout] Index active task sets by stage Id rather than by task set id 19685bb [Imran Rashid] switch to using latestInfo.attemptId, and add comments a5f7c8c [Imran Rashid] remove comment for reviewers 227b40d [Imran Rashid] style 517b6e5 [Imran Rashid] get rid of SparkIllegalStateException b2faef5 [Imran Rashid] faster check for conflicting task sets 6542b42 [Imran Rashid] remove extra stageAttemptId ada7726 [Imran Rashid] reviewer feedback d8eb202 [Imran Rashid] Merge branch 'master' into SPARK-8103 46bc26a [Imran Rashid] more cleanup of debug garbage cb245da [Imran Rashid] finally found the issue ... clean up debug stuff 8c29707 [Imran Rashid] Merge branch 'master' into SPARK-8103 89a59b6 [Imran Rashid] more printlns ... 9601b47 [Imran Rashid] more debug printlns ecb4e7d [Imran Rashid] debugging printlns b6bc248 [Imran Rashid] style 55f4a94 [Imran Rashid] get rid of more random test case since kays tests are clearer 7021d28 [Imran Rashid] update test since listenerBus.waitUntilEmpty now throws an exception instead of returning a boolean 883fe49 [Kay Ousterhout] Unit tests for concurrent stages issue 6e14683 [Imran Rashid] unit test just to make sure we fail fast on concurrent attempts 06a0af6 [Imran Rashid] ignore for jenkins c443def [Imran Rashid] better fix and simpler test case 28d70aa [Imran Rashid] wip on getting a better test case ... a9bf31f [Imran Rashid] wip

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103

Because `lazy val` uses `this` lock, if JobGenerator.stop and JobGenerator.doCheckpoint (JobGenerator.shouldCheckpoint has not yet been initialized) run at the same time, it may hang. Here are the stack traces for the deadlock: ```Java "pool-1-thread-1-ScalaTest-running-StreamingListenerSuite" alteryx#11 prio=5 os_prio=31 tid=0x00007fd35d094800 nid=0x5703 in Object.wait() [0x000000012ecaf000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1245) - locked <0x00000007b5d8d7f8> (a org.apache.spark.util.EventLoop$$anon$1) at java.lang.Thread.join(Thread.java:1319) at org.apache.spark.util.EventLoop.stop(EventLoop.scala:81) at org.apache.spark.streaming.scheduler.JobGenerator.stop(JobGenerator.scala:155) - locked <0x00000007b5d8cea0> (a org.apache.spark.streaming.scheduler.JobGenerator) at org.apache.spark.streaming.scheduler.JobScheduler.stop(JobScheduler.scala:95) - locked <0x00000007b5d8ced8> (a org.apache.spark.streaming.scheduler.JobScheduler) at org.apache.spark.streaming.StreamingContext.stop(StreamingContext.scala:687) "JobGenerator" alteryx#67 daemon prio=5 os_prio=31 tid=0x00007fd35c3b9800 nid=0x9f03 waiting for monitor entry [0x0000000139e4a000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint$lzycompute(JobGenerator.scala:63) - waiting to lock <0x00000007b5d8cea0> (a org.apache.spark.streaming.scheduler.JobGenerator) at org.apache.spark.streaming.scheduler.JobGenerator.shouldCheckpoint(JobGenerator.scala:63) at org.apache.spark.streaming.scheduler.JobGenerator.doCheckpoint(JobGenerator.scala:290) at org.apache.spark.streaming.scheduler.JobGenerator.org$apache$spark$streaming$scheduler$JobGenerator$$processEvent(JobGenerator.scala:182) at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:83) at org.apache.spark.streaming.scheduler.JobGenerator$$anon$1.onReceive(JobGenerator.scala:82) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) ``` I can use this patch to produce this deadlock: zsxwing@8a88f28 And a timeout build in Jenkins due to this deadlock: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1654/ This PR initializes `checkpointWriter` before `eventLoop` uses it to avoid this deadlock. Author: zsxwing <zsxwing@gmail.com> Closes apache#8326 from zsxwing/SPARK-10125.

viirya and others added 8 commits July 19, 2015 20:59

Merge branch 'branch-1.4' of github.com:apache/spark into csd-1.4

fce746d

markhamstra assigned mbautin Jul 22, 2015

mbautin added a commit that referenced this pull request Jul 22, 2015

Merge pull request #67 from markhamstra/csd-1.4

7bf1d88

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103

mbautin merged commit 7bf1d88 into alteryx:csd-1.4 Jul 22, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103 #67

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103 #67

markhamstra commented Jul 22, 2015

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103 #67

SKIPME merging Apache branch-1.4 bug fixes and backporting SPARK-8103 #67

Conversation

markhamstra commented Jul 22, 2015