forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged Apache branch-1.6 #145
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…in GROUP BY clause cloud-fan Can you please take a look ? In this case, we are failing during check analysis while validating the aggregation expression. I have added a semanticEquals for HiveGenericUDF to fix this. Please let me know if this is the right way to address this issue. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes apache#10520 from dilipbiswal/spark-12558. (cherry picked from commit dc7b387) Signed-off-by: Yin Huai <yhuai@databricks.com> Conflicts: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveShim.scala
The default run has changed, but the documentation didn't fully reflect the change. Author: Luc Bourlier <luc.bourlier@typesafe.com> Closes apache#10740 from skyluc/issue/mesos-modes-doc. (cherry picked from commit cc91e21) Signed-off-by: Reynold Xin <rxin@databricks.com>
…verflow jira: https://issues.apache.org/jira/browse/SPARK-12685 master PR: apache#10627 the log of word2vec reports trainWordsCount = -785727483 during computation over a large dataset. Update the priority as it will affect the computation process. alpha = learningRate * (1 - numPartitions * wordCount.toDouble / (trainWordsCount + 1)) Author: Yuhao Yang <hhbyyh@gmail.com> Closes apache#10721 from hhbyyh/branch-1.4. (cherry picked from commit 7bd2564) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…thon3 This replaces the `execfile` used for running custom python shell scripts with explicit open, compile and exec (as recommended by 2to3). The reason for this change is to make the pythonstartup option compatible with python3. Author: Erik Selin <erik.selin@gmail.com> Closes apache#10255 from tyro89/pythonstartup-python3. (cherry picked from commit e4e0b3f) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
I hit the exception below. The `UnsafeKVExternalSorter` does pass `null` as the consumer when creating an `UnsafeInMemorySorter`. Normally the NPE doesn't occur because the `inMemSorter` is set to null later and the `free()` method is not called. It happens when there is another exception like OOM thrown before setting `inMemSorter` to null. Anyway, we can add the null check to avoid it. ``` ERROR spark.TaskContextImpl: Error in TaskCompletionListener java.lang.NullPointerException at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.free(UnsafeInMemorySorter.java:110) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.cleanupResources(UnsafeExternalSorter.java:288) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter$1.onTaskCompletion(UnsafeExternalSorter.java:141) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:91) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) ``` Author: Carson Wang <carson.wang@intel.com> Closes apache#10637 from carsonwang/FixNPE. (cherry picked from commit eabc7b8) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
…number of features is large jira: https://issues.apache.org/jira/browse/SPARK-12026 The issue is valid as features.toArray.view.zipWithIndex.slice(startCol, endCol) becomes slower as startCol gets larger. I tested on local and the change can improve the performance and the running time was stable. Author: Yuhao Yang <hhbyyh@gmail.com> Closes apache#10146 from hhbyyh/chiSq. (cherry picked from commit 021dafc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
When an Executor process is destroyed, the FileAppender that is asynchronously reading the stderr stream of the process can throw an IOException during read because the stream is closed. Before the ExecutorRunner destroys the process, the FileAppender thread is flagged to stop. This PR wraps the inputStream.read call of the FileAppender in a try/catch block so that if an IOException is thrown and the thread has been flagged to stop, it will safely ignore the exception. Additionally, the FileAppender thread was changed to use Utils.tryWithSafeFinally to better log any exception that do occur. Added unit tests to verify a IOException is thrown and logged if FileAppender is not flagged to stop, and that no IOException when the flag is set. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#10714 from BryanCutler/file-appender-read-ioexception-SPARK-9844. (cherry picked from commit 56cdbd6) Signed-off-by: Sean Owen <sowen@cloudera.com>
… allocation Add `listener.synchronized` to get `storageStatusList` and `execInfo` atomically. Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#10728 from zsxwing/SPARK-12784. (cherry picked from commit 501e99e) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
If sort column contains slash(e.g. "Executor ID / Host") when yarn mode,sort fail with following message.  It's similar to SPARK-4313 . Author: root <root@R520T1.(none)> Author: Koyo Yoshida <koyo0615@gmail.com> Closes apache#10663 from yoshidakuy/SPARK-12708. (cherry picked from commit 32cca93) Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com> Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu> Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com> Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net> Closes apache#9613 from olarayej/SPARK-11031. (cherry picked from commit ba4a641) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…read completion Changed Logging FileAppender to use join in `awaitTermination` to ensure that thread is properly finished before returning. Author: Bryan Cutler <cutlerb@gmail.com> Closes apache#10654 from BryanCutler/fileAppender-join-thread-SPARK-12701. (cherry picked from commit ea104b8) Signed-off-by: Sean Owen <sowen@cloudera.com>
http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline ``` val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model") ``` should be ``` val sameModel = PipelineModel.load("/tmp/spark-logistic-regression-model") ``` cc: jkbradley Author: Jeff Lam <sha0lin@alumni.carnegiemellon.edu> Closes apache#10769 from Agent007/SPARK-12722. (cherry picked from commit 86972fa) Signed-off-by: Sean Owen <sowen@cloudera.com>
markhamstra
pushed a commit
to markhamstra/spark
that referenced
this pull request
Nov 7, 2017
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.