Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged Apache branch-1.6 #145

Merged
merged 13 commits into from
Jan 17, 2016
Merged

Merged Apache branch-1.6 #145

merged 13 commits into from
Jan 17, 2016

Conversation

markhamstra
Copy link

No description provided.

dilipbiswal and others added 13 commits January 12, 2016 21:45
…in GROUP BY clause

cloud-fan Can you please take a look ?

In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue.

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes apache#10520 from dilipbiswal/spark-12558.

(cherry picked from commit dc7b387)
Signed-off-by: Yin Huai <yhuai@databricks.com>

Conflicts:
	sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
The default run has changed, but the documentation didn't fully reflect the change.

Author: Luc Bourlier <luc.bourlier@typesafe.com>

Closes apache#10740 from skyluc/issue/mesos-modes-doc.

(cherry picked from commit cc91e21)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…verflow

jira: https://issues.apache.org/jira/browse/SPARK-12685

master PR: apache#10627

the log of word2vec reports
trainWordsCount = -785727483
during computation over a large dataset.

Update the priority as it will affect the computation process.
alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1))

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes apache#10721 from hhbyyh/branch-1.4.

(cherry picked from commit 7bd2564)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…thon3

This replaces the `execfile` used for running custom python shell scripts
with explicit open, compile and exec (as recommended by 2to3). The reason
for this change is to make the pythonstartup option compatible with python3.

Author: Erik Selin <erik.selin@gmail.com>

Closes apache#10255 from tyro89/pythonstartup-python3.

(cherry picked from commit e4e0b3f)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it.

```
ERROR spark.TaskContextImpl: Error in TaskCompletionListener
java.lang.NullPointerException
        at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288)
        at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141)
        at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79)
        at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77)
        at org.apache.spark.scheduler.Task.run(Task.scala:91)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
```

Author: Carson Wang <carson.wang@intel.com>

Closes apache#10637 from carsonwang/FixNPE.

(cherry picked from commit eabc7b8)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
…number of features is large

jira: https://issues.apache.org/jira/browse/SPARK-12026

The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger.

I tested on local and the change can improve the performance and the running time was stable.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes apache#10146 from hhbyyh/chiSq.

(cherry picked from commit 021dafc)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed.  Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop.  This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception.  Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur.  Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844.

(cherry picked from commit 56cdbd6)
Signed-off-by: Sean Owen <sowen@cloudera.com>
… allocation

Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically.

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#10728 from zsxwing/SPARK-12784.

(cherry picked from commit 501e99e)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message.

![spark-12708](https://cloud.githubusercontent.com/assets/6679275/12193320/80814f8c-b62a-11e5-9914-7bf3907029df.png)

It's similar to SPARK-4313 .

Author: root <root@R520T1.(none)>
Author: Koyo Yoshida <koyo0615@gmail.com>

Closes apache#10663 from yoshidakuy/SPARK-12708.

(cherry picked from commit 32cca93)
Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>

Closes apache#9613 from olarayej/SPARK-11031.

(cherry picked from commit ba4a641)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…read completion

Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning.

Author: Bryan Cutler <cutlerb@gmail.com>

Closes apache#10654 from BryanCutler/fileAppender-join-thread-SPARK-12701.

(cherry picked from commit ea104b8)
Signed-off-by: Sean Owen <sowen@cloudera.com>
http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline
```
val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
```
should be
```
val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model")
```
cc: jkbradley

Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu>

Closes apache#10769 from Agent007/SPARK-12722.

(cherry picked from commit 86972fa)
Signed-off-by: Sean Owen <sowen@cloudera.com>
markhamstra added a commit that referenced this pull request Jan 17, 2016
@markhamstra markhamstra merged commit b4a0e10 into alteryx:csd-1.6 Jan 17, 2016
markhamstra pushed a commit to markhamstra/spark that referenced this pull request Nov 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants