-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Csd branch 0.9 merge #9
Conversation
Fix configure didn't work small problem in ALS
This reverts commit 942c80b.
This reverts commit 669ba4c.
Revert PR 381 This PR missed a bunch of test cases that require "spark.cleaner.ttl". I think it is what is causing test failures on Jenkins right now (though it's a bit hard to tell because the DNS for cs.berkeley.edu is down). I'm submitting this to see if it fixes jeknins. I did try just patching various tests but it was taking a really long time because there are a bunch of them, so for now I'm just seeing if a revert works.
Fix UI bug introduced in alteryx#244. The 'duration' field was incorrectly renamed to 'task time' in the table that lists stages.
Minor update for clone writables and more documentation.
- Added a Python wrapper for Naive Bayes - Updated the Scala Naive Bayes to match the style of our other algorithms better and in particular make it easier to call from Java (added builder pattern, removed default value in train method) - Updated Python MLlib functions to not require a SparkContext; we can get that from the RDD the user gives - Added a toString method in LabeledPoint - Made the Python MLlib tests run as part of run-tests as well (before they could only be run individually through each file)
We've used camel case in other Spark methods so it felt reasonable to keep using it here and make the code match Scala/Java as much as possible. Note that parameter names matter in Python because it allows passing optional parameters by name.
Also fixes mains of a few other algorithms to print the final model
…ected[streaming] to private[streaming] in StreamingContext and DStream. Added waitForStop to StreamingContext, and StreamingContextSuite.
This helps in case the exception happened while serializing a record to be sent to Java, leaving the stream to Java in an inconsistent state where PythonRDD won't be able to read the error.
-) Only change simple return statements at the end of method -) Ignore the complex if-else check -) Ignore the ones inside synchronized
…spark.streaming to org.apache.spark.streaming.dstream.
The current doc hints spark doesn't support accumulators of type `Long`, which is wrong. JIRA: https://spark-project.atlassian.net/browse/SPARK-1117 Author: Xiangrui Meng <meng@databricks.com> Closes apache#631 from mengxr/acc and squashes the following commits: 45ecd25 [Xiangrui Meng] update accumulator docs (cherry picked from commit aaec7d4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>
As reported in https://spark-project.atlassian.net/browse/SPARK-1055 "The used Spark version in the .../base/Dockerfile is stale on 0.8.1 and should be updated to 0.9.x to match the release." Author: CodingCat <zhunansjtu@gmail.com> Author: Nan Zhu <CodingCat@users.noreply.github.com> Closes apache#634 from CodingCat/SPARK-1055 and squashes the following commits: cb7330e [Nan Zhu] Update Dockerfile adf8259 [CodingCat] fix the SCALA_VERSION and SPARK_VERSION in docker file (cherry picked from commit 1aa4f8a) Signed-off-by: Aaron Davidson <aaron@databricks.com>
In the previous code, if you had a failing map stage and then tried to run reduce stages on it repeatedly, the first reduce stage would fail correctly, but the later ones would mistakenly believe that all map outputs are available and start failing infinitely with fetch failures from "null".
our stage instead of using our shuffleID.
A recent PR that added Java vs Scala tabs for streaming also inadvertently added some bad code to a document.ready handler, breaking our other handler that manages scrolling to anchors correctly with the floating top bar. As a result the section title ended up always being hidden below the top bar. This removes the unnecessary JavaScript code. Author: Matei Zaharia <matei@databricks.com> Closes alteryx#3 from mateiz/doc-links and squashes the following commits: e2a3488 [Matei Zaharia] SPARK-1135: fix broken anchors in docs
This surroungs the complete worker code in a try/except block so we catch any error that arrives. An example would be the depickling failing for some reason @JoshRosen Author: Bouke van der Bijl <boukevanderbijl@gmail.com> Closes apache#644 from bouk/catch-depickling-errors and squashes the following commits: f0f67cc [Bouke van der Bijl] Lol indentation 0e4d504 [Bouke van der Bijl] Surround the complete python worker with the try block (cherry picked from commit 12738c1) Signed-off-by: Josh Rosen <joshrosen@apache.org>
Conflicts: CHANGES.txt README.md assembly/pom.xml bagel/pom.xml bin/spark-class bin/spark-class2.cmd core/pom.xml core/src/main/java/org/apache/spark/network/netty/FileServerHandler.java core/src/main/java/org/apache/spark/network/netty/PathResolver.java core/src/main/scala/org/apache/spark/Aggregator.scala core/src/main/scala/org/apache/spark/CacheManager.scala core/src/main/scala/org/apache/spark/FutureAction.scala core/src/main/scala/org/apache/spark/InterruptibleIterator.scala core/src/main/scala/org/apache/spark/MapOutputTracker.scala core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/TaskEndReason.scala core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala core/src/main/scala/org/apache/spark/api/java/function/FlatMapFunction.scala core/src/main/scala/org/apache/spark/api/java/function/FlatMapFunction2.scala core/src/main/scala/org/apache/spark/api/java/function/Function.java core/src/main/scala/org/apache/spark/api/java/function/Function2.java core/src/main/scala/org/apache/spark/api/java/function/Function3.java core/src/main/scala/org/apache/spark/api/java/function/Function4.java core/src/main/scala/org/apache/spark/api/java/function/PairFlatMapFunction.java core/src/main/scala/org/apache/spark/api/java/function/PairFunction.java core/src/main/scala/org/apache/spark/api/java/function/WrappedFunction3.scala core/src/main/scala/org/apache/spark/broadcast/BitTorrentBroadcast.scala core/src/main/scala/org/apache/spark/broadcast/Broadcast.scala core/src/main/scala/org/apache/spark/broadcast/HttpBroadcast.scala core/src/main/scala/org/apache/spark/broadcast/MultiTracker.scala core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala core/src/main/scala/org/apache/spark/broadcast/TreeBroadcast.scala core/src/main/scala/org/apache/spark/deploy/DeployMessage.scala core/src/main/scala/org/apache/spark/deploy/FaultToleranceTest.scala core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala core/src/main/scala/org/apache/spark/deploy/client/Client.scala core/src/main/scala/org/apache/spark/deploy/client/TestClient.scala core/src/main/scala/org/apache/spark/deploy/master/ApplicationInfo.scala core/src/main/scala/org/apache/spark/deploy/master/ApplicationState.scala core/src/main/scala/org/apache/spark/deploy/master/FileSystemPersistenceEngine.scala core/src/main/scala/org/apache/spark/deploy/master/Master.scala core/src/main/scala/org/apache/spark/deploy/master/PersistenceEngine.scala core/src/main/scala/org/apache/spark/deploy/master/RecoveryState.scala core/src/main/scala/org/apache/spark/deploy/master/SparkZooKeeperSession.scala core/src/main/scala/org/apache/spark/deploy/master/WorkerInfo.scala core/src/main/scala/org/apache/spark/deploy/master/WorkerState.scala core/src/main/scala/org/apache/spark/deploy/master/ZooKeeperLeaderElectionAgent.scala core/src/main/scala/org/apache/spark/deploy/master/ZooKeeperPersistenceEngine.scala core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala core/src/main/scala/org/apache/spark/executor/Executor.scala core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala core/src/main/scala/org/apache/spark/network/netty/ShuffleCopier.scala core/src/main/scala/org/apache/spark/network/netty/ShuffleSender.scala core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala core/src/main/scala/org/apache/spark/rdd/BlockRDD.scala core/src/main/scala/org/apache/spark/rdd/CheckpointRDD.scala core/src/main/scala/org/apache/spark/rdd/CoGroupedRDD.scala core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsRDD.scala core/src/main/scala/org/apache/spark/rdd/MapPartitionsWithIndexRDD.scala core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala core/src/main/scala/org/apache/spark/rdd/RDD.scala core/src/main/scala/org/apache/spark/rdd/ShuffledRDD.scala core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerEvent.scala core/src/main/scala/org/apache/spark/scheduler/DAGSchedulerSource.scala core/src/main/scala/org/apache/spark/scheduler/JobLogger.scala core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala core/src/main/scala/org/apache/spark/scheduler/StageInfo.scala core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala core/src/main/scala/org/apache/spark/scheduler/TaskResult.scala core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SimrSchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/cluster/SparkDeploySchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalScheduler.scala core/src/main/scala/org/apache/spark/scheduler/local/LocalTaskSetManager.scala core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala core/src/main/scala/org/apache/spark/storage/BlockId.scala core/src/main/scala/org/apache/spark/storage/BlockManager.scala core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala core/src/main/scala/org/apache/spark/storage/MemoryStore.scala core/src/main/scala/org/apache/spark/storage/ShuffleBlockManager.scala core/src/main/scala/org/apache/spark/storage/StoragePerfTester.scala core/src/main/scala/org/apache/spark/ui/UIWorkloadGenerator.scala core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala core/src/main/scala/org/apache/spark/util/MetadataCleaner.scala core/src/main/scala/org/apache/spark/util/Utils.scala core/src/main/scala/org/apache/spark/util/collection/BitSet.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/OpenHashSet.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveKeyOpenHashMap.scala core/src/main/scala/org/apache/spark/util/collection/PrimitiveVector.scala core/src/test/scala/org/apache/spark/CheckpointSuite.scala core/src/test/scala/org/apache/spark/DistributedSuite.scala core/src/test/scala/org/apache/spark/FileServerSuite.scala core/src/test/scala/org/apache/spark/JavaAPISuite.java core/src/test/scala/org/apache/spark/deploy/JsonProtocolSuite.scala core/src/test/scala/org/apache/spark/deploy/worker/ExecutorRunnerTest.scala core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala core/src/test/scala/org/apache/spark/scheduler/ClusterSchedulerSuite.scala core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala core/src/test/scala/org/apache/spark/scheduler/FakeTask.scala core/src/test/scala/org/apache/spark/scheduler/JobLoggerSuite.scala core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala core/src/test/scala/org/apache/spark/scheduler/local/LocalSchedulerSuite.scala core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala core/src/test/scala/org/apache/spark/storage/BlockIdSuite.scala core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala core/src/test/scala/org/apache/spark/storage/DiskBlockManagerSuite.scala core/src/test/scala/org/apache/spark/util/collection/OpenHashMapSuite.scala core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala docker/spark-test/base/Dockerfile docs/_config.yml docs/_layouts/global.html docs/building-with-maven.md docs/configuration.md docs/mllib-guide.md docs/python-programming-guide.md docs/running-on-yarn.md docs/streaming-programming-guide.md docs/tuning.md ec2/spark_ec2.py examples/pom.xml examples/src/main/scala/org/apache/spark/examples/BroadcastTest.scala examples/src/main/scala/org/apache/spark/streaming/examples/clickstream/PageViewGenerator.scala graphx/src/test/scala/org/apache/spark/graphx/lib/SVDPlusPlusSuite.scala mllib/pom.xml mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala mllib/src/test/java/org/apache/spark/mllib/recommendation/JavaALSSuite.java mllib/src/test/scala/org/apache/spark/mllib/recommendation/ALSSuite.scala pom.xml project/SparkBuild.scala python/pyspark/context.py python/pyspark/rdd.py python/pyspark/shell.py python/pyspark/tests.py repl/pom.xml repl/src/main/scala/org/apache/spark/repl/SparkILoop.scala repl/src/main/scala/org/apache/spark/repl/SparkIMain.scala repl/src/test/scala/org/apache/spark/repl/ReplSuite.scala sbin/stop-slaves.sh streaming/pom.xml streaming/src/main/scala/org/apache/spark/streaming/Checkpoint.scala streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaDStreamLike.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaPairDStream.scala streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/NetworkInputDStream.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala streaming/src/main/scala/org/apache/spark/streaming/receivers/ActorReceiver.scala streaming/src/main/scala/org/apache/spark/streaming/scheduler/NetworkInputTracker.scala streaming/src/test/java/org/apache/spark/streaming/JavaAPISuite.java streaming/src/test/java/org/apache/spark/streaming/JavaTestUtils.scala streaming/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/InputStreamsSuite.scala streaming/src/test/scala/org/apache/spark/streaming/TestSuiteBase.scala tools/pom.xml yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/WorkerRunnable.scala yarn/alpha/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocationHandler.scala yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala yarn/pom.xml yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
Do we want to go straight into master-csd, or should there be a separate branch instead? |
closing to resolve some conflicts. |
yeah, I think we could go into a separate branch and merge into master-csd later (to hedge our bets) |
I just went through the diff between jhartlaub:csd-branch-0.9-merge and clearstorydata:csd-2.10. I went through quickly and may have missed something, but what I find is that many of the differences are either completely inconsequential or are inconsequential to us; but for those that are consequential, I believe that we should be using the csd-2.10 version. Several of those are post-0.9.1 bug fixes, including my recent SPY-302 work; and the diff between csd-2.10 and Apache branch-0.9 is smaller. I think that you can easily substitute csd-2.10 for csd-branch-0.9-merge, and that we should use csd-2.10. |
ok, I can do that- no problem. |
StandaloneSchedulerBackend instead of the smaller IDs used within Spark (that lack the application name). This was reported by ClearStory in alteryx/spark#9. Also fixed some messages that said slave instead of executor.
…tion improvements Author: Michael Armbrust <michael@databricks.com> Author: Gregory Owen <greowen@gmail.com> Closes apache#1935 from marmbrus/countDistinctPartial and squashes the following commits: 5c7848d [Michael Armbrust] turn off caching in the constructor 8074a80 [Michael Armbrust] fix tests 32d216f [Michael Armbrust] reynolds comments c122cca [Michael Armbrust] Address comments, add tests b2e8ef3 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial fae38f4 [Michael Armbrust] Fix style fdca896 [Michael Armbrust] cleanup 93d0f64 [Michael Armbrust] metastore concurrency fix. db44a30 [Michael Armbrust] JIT hax. 3868f6c [Michael Armbrust] Merge pull request alteryx#9 from GregOwen/countDistinctPartial c9e67de [Gregory Owen] Made SpecificRow and types serializable by Kryo 2b46c4b [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 8ff6402 [Michael Armbrust] Add specific row. 58d15f1 [Michael Armbrust] disable codegen logging 87d101d [Michael Armbrust] Fix isNullAt bug abee26d [Michael Armbrust] WIP 27984d0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 57ae3b1 [Michael Armbrust] Fix order dependent test b3d0f64 [Michael Armbrust] Add golden files. c1f7114 [Michael Armbrust] Improve tests / fix serialization. f31b8ad [Michael Armbrust] more fixes 38c7449 [Michael Armbrust] comments and style 9153652 [Michael Armbrust] better toString d494598 [Michael Armbrust] Fix tests now that the planner is better 41fbd1d [Michael Armbrust] Never try and create an empty hash set. 050bb97 [Michael Armbrust] Skip no-arg constructors for kryo, bd08239 [Michael Armbrust] WIP 213ada8 [Michael Armbrust] First draft of partially aggregated and code generated count distinct / max
…tion improvements Author: Michael Armbrust <michael@databricks.com> Author: Gregory Owen <greowen@gmail.com> Closes apache#1935 from marmbrus/countDistinctPartial and squashes the following commits: 5c7848d [Michael Armbrust] turn off caching in the constructor 8074a80 [Michael Armbrust] fix tests 32d216f [Michael Armbrust] reynolds comments c122cca [Michael Armbrust] Address comments, add tests b2e8ef3 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial fae38f4 [Michael Armbrust] Fix style fdca896 [Michael Armbrust] cleanup 93d0f64 [Michael Armbrust] metastore concurrency fix. db44a30 [Michael Armbrust] JIT hax. 3868f6c [Michael Armbrust] Merge pull request alteryx#9 from GregOwen/countDistinctPartial c9e67de [Gregory Owen] Made SpecificRow and types serializable by Kryo 2b46c4b [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 8ff6402 [Michael Armbrust] Add specific row. 58d15f1 [Michael Armbrust] disable codegen logging 87d101d [Michael Armbrust] Fix isNullAt bug abee26d [Michael Armbrust] WIP 27984d0 [Michael Armbrust] Merge remote-tracking branch 'origin/master' into countDistinctPartial 57ae3b1 [Michael Armbrust] Fix order dependent test b3d0f64 [Michael Armbrust] Add golden files. c1f7114 [Michael Armbrust] Improve tests / fix serialization. f31b8ad [Michael Armbrust] more fixes 38c7449 [Michael Armbrust] comments and style 9153652 [Michael Armbrust] better toString d494598 [Michael Armbrust] Fix tests now that the planner is better 41fbd1d [Michael Armbrust] Never try and create an empty hash set. 050bb97 [Michael Armbrust] Skip no-arg constructors for kryo, bd08239 [Michael Armbrust] WIP 213ada8 [Michael Armbrust] First draft of partially aggregated and code generated count distinct / max (cherry picked from commit 7e191fe) Signed-off-by: Michael Armbrust <michael@databricks.com>
…if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request alteryx#11 from apache/master e249846 [YanTangZhai] Merge pull request alteryx#10 from apache/master d26d982 [YanTangZhai] Merge pull request alteryx#9 from apache/master 76d4027 [YanTangZhai] Merge pull request alteryx#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request alteryx#7 from apache/master 8a00106 [YanTangZhai] Merge pull request alteryx#6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master (cherry picked from commit 1066427) Signed-off-by: Michael Armbrust <michael@databricks.com>
…if sql has null val jsc = new org.apache.spark.api.java.JavaSparkContext(sc) val jhc = new org.apache.spark.sql.hive.api.java.JavaHiveContext(jsc) val nrdd = jhc.hql("select null from spark_test.for_test") println(nrdd.schema) Then the error is thrown as follows: scala.MatchError: NullType (of class org.apache.spark.sql.catalyst.types.NullType$) at org.apache.spark.sql.types.util.DataTypeConversions$.asJavaDataType(DataTypeConversions.scala:43) Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3538 from YanTangZhai/MatchNullType and squashes the following commits: e052dff [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 4b4bb34 [yantangzhai] [SPARK-4676] [SQL] JavaSchemaRDD.schema may throw NullType MatchError if sql has null 896c7b7 [yantangzhai] fix NullType MatchError in JavaSchemaRDD when sql has null 6e643f8 [YanTangZhai] Merge pull request alteryx#11 from apache/master e249846 [YanTangZhai] Merge pull request alteryx#10 from apache/master d26d982 [YanTangZhai] Merge pull request alteryx#9 from apache/master 76d4027 [YanTangZhai] Merge pull request alteryx#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request alteryx#7 from apache/master 8a00106 [YanTangZhai] Merge pull request alteryx#6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
…ins an empty AttributeSet() references The sql "select * from spark_test::for_test where abs(20141202) is not null" has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)) and partitionKeyIds=AttributeSet(). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Then the exception "java.lang.IllegalArgumentException: requirement failed: Partition pruning predicates only supported for partitioned tables." is thrown. The sql "select * from spark_test::for_test_partitioned_table where abs(20141202) is not null and type_id=11 and platform = 3" with partitioned key insert_date has predicates=List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202), (type_id#12 = 11), (platform#8 = 3)) and partitionKeyIds=AttributeSet(insert_date#24). PruningPredicates is List(IS NOT NULL HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFAbs(20141202)). Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes apache#3556 from YanTangZhai/SPARK-4693 and squashes the following commits: 620ebe3 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 37cfdf5 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references 70a3544 [yantangzhai] [SPARK-4693] [SQL] PruningPredicates may be wrong if predicates contains an empty AttributeSet() references efa9b03 [YanTangZhai] Update HiveQuerySuite.scala 72accf1 [YanTangZhai] Update HiveQuerySuite.scala e572b9a [YanTangZhai] Update HiveStrategies.scala 6e643f8 [YanTangZhai] Merge pull request alteryx#11 from apache/master e249846 [YanTangZhai] Merge pull request alteryx#10 from apache/master d26d982 [YanTangZhai] Merge pull request alteryx#9 from apache/master 76d4027 [YanTangZhai] Merge pull request alteryx#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request alteryx#7 from apache/master 8a00106 [YanTangZhai] Merge pull request alteryx#6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
…askTracker to reduce the chance of the communicating problem Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem Author: YanTangZhai <hakeemzhai@tencent.com> Author: yantangzhai <tyz0303@163.com> Closes apache#3785 from YanTangZhai/SPARK-4946 and squashes the following commits: 9ca6541 [yantangzhai] [SPARK-4946] [CORE] Using AkkaUtils.askWithReply in MapOutputTracker.askTracker to reduce the chance of the communicating problem e4c2c0a [YanTangZhai] Merge pull request alteryx#15 from apache/master 718afeb [YanTangZhai] Merge pull request alteryx#12 from apache/master 6e643f8 [YanTangZhai] Merge pull request alteryx#11 from apache/master e249846 [YanTangZhai] Merge pull request alteryx#10 from apache/master d26d982 [YanTangZhai] Merge pull request alteryx#9 from apache/master 76d4027 [YanTangZhai] Merge pull request alteryx#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request alteryx#7 from apache/master 8a00106 [YanTangZhai] Merge pull request alteryx#6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
When debugging Spark Streaming applications it is necessary to monitor certain metrics that are not shown in the Spark application UI. For example, what is average processing time of batches? What is the scheduling delay? Is the system able to process as fast as it is receiving data? How many records I am receiving through my receivers? While the StreamingListener interface introduced in the 0.9 provided some of this information, it could only be accessed programmatically. A UI that shows information specific to the streaming applications is necessary for easier debugging. This PR introduces such a UI. It shows various statistics related to the streaming application. Here is a screenshot of the UI running on my local machine. http://i.imgur.com/1ooDGhm.png This UI is integrated into the Spark UI running at 4040. Author: Tathagata Das <tathagata.das1565@gmail.com> Author: Andrew Or <andrewor14@gmail.com> Closes apache#290 from tdas/streaming-web-ui and squashes the following commits: fc73ca5 [Tathagata Das] Merge pull request alteryx#9 from andrewor14/ui-refactor 642dd88 [Andrew Or] Merge SparkUISuite.scala into UISuite.scala eb30517 [Andrew Or] Merge github.com:apache/spark into ui-refactor f4f4cbe [Tathagata Das] More minor fixes. 34bb364 [Tathagata Das] Merge branch 'streaming-web-ui' of github.com:tdas/spark into streaming-web-ui 252c566 [Tathagata Das] Merge pull request alteryx#8 from andrewor14/ui-refactor e038b4b [Tathagata Das] Addressed Patrick's comments. 125a054 [Andrew Or] Disable serving static resources with gzip 90feb8d [Andrew Or] Address Patrick's comments 89dae36 [Tathagata Das] Merge branch 'streaming-web-ui' of github.com:tdas/spark into streaming-web-ui 72fe256 [Tathagata Das] Merge pull request alteryx#6 from andrewor14/ui-refactor 2fc09c8 [Tathagata Das] Added binary check exclusions aa396d4 [Andrew Or] Rename tabs and pages (No more IndexPage.scala) f8e1053 [Tathagata Das] Added Spark and Streaming UI unit tests. caa5e05 [Tathagata Das] Merge branch 'streaming-web-ui' of github.com:tdas/spark into streaming-web-ui 585cd65 [Tathagata Das] Merge pull request alteryx#5 from andrewor14/ui-refactor 914b8ff [Tathagata Das] Moved utils functions to UIUtils. 548c98c [Andrew Or] Wide refactoring of WebUI, UITab, and UIPage (see commit message) 6de06b0 [Tathagata Das] Merge remote-tracking branch 'apache/master' into streaming-web-ui ee6543f [Tathagata Das] Minor changes based on Andrew's comments. fa760fe [Tathagata Das] Fixed long line. 1c0bcef [Tathagata Das] Refactored streaming UI into two files. 1af239b [Tathagata Das] Changed streaming UI to attach itself as a tab with the Spark UI. 827e81a [Tathagata Das] Merge branch 'streaming-web-ui' of github.com:tdas/spark into streaming-web-ui 168fe86 [Tathagata Das] Merge pull request #2 from andrewor14/ui-refactor 3e986f8 [Tathagata Das] Merge remote-tracking branch 'apache/master' into streaming-web-ui c78c92d [Andrew Or] Remove outdated comment 8f7323b [Andrew Or] End of file new lines, indentation, and imports (minor) 0d61ee8 [Andrew Or] Merge branch 'streaming-web-ui' of github.com:tdas/spark into ui-refactor 9a48fa1 [Andrew Or] Allow adding tabs to SparkUI dynamically + add example 61358e3 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-web-ui 53be2c5 [Tathagata Das] Minor style updates. ed25dfc [Andrew Or] Generalize SparkUI header to display tabs dynamically a37ad4f [Andrew Or] Comments, imports and formatting (minor) cd000b0 [Andrew Or] Merge github.com:apache/spark into ui-refactor 7d57444 [Andrew Or] Refactoring the UI interface to add flexibility aef4dd5 [Tathagata Das] Added Apache licenses. db27bad [Tathagata Das] Added last batch processing time to StreamingUI. 4d86e98 [Tathagata Das] Added basic stats to the StreamingUI and refactored the UI to a Page to make it easier to transition to using SparkUI later. 93f1c69 [Tathagata Das] Added network receiver information to the Streaming UI. 56cc7fb [Tathagata Das] First cut implementation of Streaming UI.
Support ! boolean logic operator like NOT in sql as follows select * from for_test where !(col1 > col2) Author: YanTangZhai <hakeemzhai@tencent.com> Author: Michael Armbrust <michael@databricks.com> Closes apache#3555 from YanTangZhai/SPARK-4692 and squashes the following commits: 1a9f605 [YanTangZhai] Update HiveQuerySuite.scala 7c03c68 [YanTangZhai] Merge pull request alteryx#23 from apache/master 992046e [YanTangZhai] Update HiveQuerySuite.scala ea618f4 [YanTangZhai] Update HiveQuerySuite.scala 192411d [YanTangZhai] Merge pull request alteryx#17 from YanTangZhai/master e4c2c0a [YanTangZhai] Merge pull request alteryx#15 from apache/master 1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala efc4210 [YanTangZhai] Update HiveQuerySuite.scala bd2c444 [YanTangZhai] Update HiveQuerySuite.scala 1893956 [YanTangZhai] Merge pull request alteryx#14 from marmbrus/pr/3555 59e4de9 [Michael Armbrust] make hive test 718afeb [YanTangZhai] Merge pull request alteryx#12 from apache/master 950b21e [YanTangZhai] Update HiveQuerySuite.scala 74175b4 [YanTangZhai] Update HiveQuerySuite.scala 92242c7 [YanTangZhai] Update HiveQl.scala 6e643f8 [YanTangZhai] Merge pull request alteryx#11 from apache/master e249846 [YanTangZhai] Merge pull request alteryx#10 from apache/master d26d982 [YanTangZhai] Merge pull request alteryx#9 from apache/master 76d4027 [YanTangZhai] Merge pull request alteryx#8 from apache/master 03b62b0 [YanTangZhai] Merge pull request alteryx#7 from apache/master 8a00106 [YanTangZhai] Merge pull request alteryx#6 from apache/master cbcba66 [YanTangZhai] Merge pull request #3 from apache/master cdef539 [YanTangZhai] Merge pull request #1 from apache/master
SQL ``` select key from (select key from src limit 100) t2 limit 10 ``` Optimized Logical Plan before modifying ``` == Optimized Logical Plan == Limit 10 Limit 100 Project key#3 MetastoreRelation default, src, None ``` Optimized Logical Plan after modifying ``` == Optimized Logical Plan == Limit 10 Project [key#1] MetastoreRelation default, src, None ``` Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#5770 from DoingDone9/limitOptimizer and squashes the following commits: c68eaa7 [Zhongshuai Pei] Update CombiningLimitsSuite.scala 97e18cf [Zhongshuai Pei] Update Optimizer.scala 19ab875 [Zhongshuai Pei] Update CombiningLimitsSuite.scala 7db4566 [Zhongshuai Pei] Update CombiningLimitsSuite.scala e2a491d [Zhongshuai Pei] Update Optimizer.scala f03fe7f [Zhongshuai Pei] Merge pull request alteryx#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request alteryx#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request alteryx#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request alteryx#8 from apache/master 802261c [DoingDone9] Merge pull request alteryx#7 from apache/master d00303b [DoingDone9] Merge pull request alteryx#6 from apache/master 98b134f [DoingDone9] Merge pull request alteryx#5 from apache/master 161cae3 [DoingDone9] Merge pull request alteryx#4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master
SQL ``` select key from (select key,value from t1 limit 100) t2 limit 10 ``` Optimized Logical Plan before modifying ``` == Optimized Logical Plan == Limit 10 Project key#228 Limit 100 MetastoreRelation default, t1, None ``` Optimized Logical Plan after modifying ``` == Optimized Logical Plan == Limit 10 Limit 100 Project key#228 MetastoreRelation default, t1, None ``` After this, we can combine limits Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#5797 from DoingDone9/ProjectLimit and squashes the following commits: 70d0fca [Zhongshuai Pei] Update FilterPushdownSuite.scala dc83ae9 [Zhongshuai Pei] Update FilterPushdownSuite.scala 485c61c [Zhongshuai Pei] Update Optimizer.scala f03fe7f [Zhongshuai Pei] Merge pull request alteryx#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request alteryx#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request alteryx#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request alteryx#8 from apache/master 802261c [DoingDone9] Merge pull request alteryx#7 from apache/master d00303b [DoingDone9] Merge pull request alteryx#6 from apache/master 98b134f [DoingDone9] Merge pull request alteryx#5 from apache/master 161cae3 [DoingDone9] Merge pull request alteryx#4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master
…" into true or false directly SQL ``` select key from src where 3 in (4, 5); ``` Before ``` == Optimized Logical Plan == Project [key#12] Filter 3 INSET (5,4) MetastoreRelation default, src, None ``` After ``` == Optimized Logical Plan == LocalRelation [key#228], [] ``` Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#5972 from DoingDone9/InToFalse and squashes the following commits: 4c722a2 [Zhongshuai Pei] Update predicates.scala abe2bbb [Zhongshuai Pei] Update Optimizer.scala fa461a5 [Zhongshuai Pei] Update Optimizer.scala e34c28a [Zhongshuai Pei] Update predicates.scala 24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala 35ceb7a [Zhongshuai Pei] Update Optimizer.scala 36c194e [Zhongshuai Pei] Update Optimizer.scala 2e8f6ca [Zhongshuai Pei] Update Optimizer.scala 14952e2 [Zhongshuai Pei] Merge pull request alteryx#13 from apache/master f03fe7f [Zhongshuai Pei] Merge pull request alteryx#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request alteryx#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request alteryx#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request alteryx#8 from apache/master 802261c [DoingDone9] Merge pull request alteryx#7 from apache/master d00303b [DoingDone9] Merge pull request alteryx#6 from apache/master 98b134f [DoingDone9] Merge pull request alteryx#5 from apache/master 161cae3 [DoingDone9] Merge pull request alteryx#4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master
…" into true or false directly SQL ``` select key from src where 3 in (4, 5); ``` Before ``` == Optimized Logical Plan == Project [key#12] Filter 3 INSET (5,4) MetastoreRelation default, src, None ``` After ``` == Optimized Logical Plan == LocalRelation [key#228], [] ``` Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#5972 from DoingDone9/InToFalse and squashes the following commits: 4c722a2 [Zhongshuai Pei] Update predicates.scala abe2bbb [Zhongshuai Pei] Update Optimizer.scala fa461a5 [Zhongshuai Pei] Update Optimizer.scala e34c28a [Zhongshuai Pei] Update predicates.scala 24739bd [Zhongshuai Pei] Update ConstantFoldingSuite.scala f4dbf50 [Zhongshuai Pei] Update ConstantFoldingSuite.scala 35ceb7a [Zhongshuai Pei] Update Optimizer.scala 36c194e [Zhongshuai Pei] Update Optimizer.scala 2e8f6ca [Zhongshuai Pei] Update Optimizer.scala 14952e2 [Zhongshuai Pei] Merge pull request alteryx#13 from apache/master f03fe7f [Zhongshuai Pei] Merge pull request alteryx#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request alteryx#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request alteryx#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request alteryx#8 from apache/master 802261c [DoingDone9] Merge pull request alteryx#7 from apache/master d00303b [DoingDone9] Merge pull request alteryx#6 from apache/master 98b134f [DoingDone9] Merge pull request alteryx#5 from apache/master 161cae3 [DoingDone9] Merge pull request alteryx#4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master (cherry picked from commit 4b5e1fe) Signed-off-by: Michael Armbrust <michael@databricks.com>
…into a single batch. SQL ``` select * from tableA join tableB on (a > 3 and b = d) or (a > 3 and b = e) ``` Plan before modify ``` == Optimized Logical Plan == Project [a#293,b#294,c#295,d#296,e#297] Join Inner, Some(((a#293 > 3) && ((b#294 = d#296) || (b#294 = e#297)))) MetastoreRelation default, tablea, None MetastoreRelation default, tableb, None ``` Plan after modify ``` == Optimized Logical Plan == Project [a#293,b#294,c#295,d#296,e#297] Join Inner, Some(((b#294 = d#296) || (b#294 = e#297))) Filter (a#293 > 3) MetastoreRelation default, tablea, None MetastoreRelation default, tableb, None ``` CombineLimits ==> Limit(If(LessThan(ne, le), ne, le), grandChild) and LessThan is in BooleanSimplification , so CombineLimits must before BooleanSimplification and BooleanSimplification must before PushPredicateThroughJoin. Author: Zhongshuai Pei <799203320@qq.com> Author: DoingDone9 <799203320@qq.com> Closes apache#6351 from DoingDone9/master and squashes the following commits: 20de7be [Zhongshuai Pei] Update Optimizer.scala 7bc7d28 [Zhongshuai Pei] Merge pull request alteryx#17 from apache/master 0ba5f42 [Zhongshuai Pei] Update Optimizer.scala f8b9314 [Zhongshuai Pei] Update FilterPushdownSuite.scala c529d9f [Zhongshuai Pei] Update FilterPushdownSuite.scala ae3af6d [Zhongshuai Pei] Update FilterPushdownSuite.scala a04ffae [Zhongshuai Pei] Update Optimizer.scala 11beb61 [Zhongshuai Pei] Update FilterPushdownSuite.scala f2ee5fe [Zhongshuai Pei] Update Optimizer.scala be6b1d5 [Zhongshuai Pei] Update Optimizer.scala b01e622 [Zhongshuai Pei] Merge pull request alteryx#15 from apache/master 8df716a [Zhongshuai Pei] Update FilterPushdownSuite.scala d98bc35 [Zhongshuai Pei] Update FilterPushdownSuite.scala fa65718 [Zhongshuai Pei] Update Optimizer.scala ab8e9a6 [Zhongshuai Pei] Merge pull request alteryx#14 from apache/master 14952e2 [Zhongshuai Pei] Merge pull request alteryx#13 from apache/master f03fe7f [Zhongshuai Pei] Merge pull request alteryx#12 from apache/master f12fa50 [Zhongshuai Pei] Merge pull request alteryx#10 from apache/master f61210c [Zhongshuai Pei] Merge pull request alteryx#9 from apache/master 34b1a9a [Zhongshuai Pei] Merge pull request alteryx#8 from apache/master 802261c [DoingDone9] Merge pull request alteryx#7 from apache/master d00303b [DoingDone9] Merge pull request alteryx#6 from apache/master 98b134f [DoingDone9] Merge pull request alteryx#5 from apache/master 161cae3 [DoingDone9] Merge pull request alteryx#4 from apache/master c87e8b6 [DoingDone9] Merge pull request #3 from apache/master cb1852d [DoingDone9] Merge pull request #2 from apache/master c3f046f [DoingDone9] Merge pull request #1 from apache/master
## What changes were proposed in this pull request? This PR brings the support for chained Python UDFs, for example ```sql select udf1(udf2(a)) select udf1(udf2(a) + 3) select udf1(udf2(a) + udf3(b)) ``` Also directly chained unary Python UDFs are put in single batch of Python UDFs, others may require multiple batches. For example, ```python >>> sqlContext.sql("select double(double(1))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#10 AS double(double(1))alteryx#9] : +- INPUT +- !BatchPythonEvaluation double(double(1)), [pythonUDF#10] +- Scan OneRowRelation[] >>> sqlContext.sql("select double(double(1) + double(2))").explain() == Physical Plan == WholeStageCodegen : +- Project [pythonUDF#19 AS double((double(1) + double(2)))alteryx#16] : +- INPUT +- !BatchPythonEvaluation double((pythonUDF#17 + pythonUDF#18)), [pythonUDF#17,pythonUDF#18,pythonUDF#19] +- !BatchPythonEvaluation double(2), [pythonUDF#17,pythonUDF#18] +- !BatchPythonEvaluation double(1), [pythonUDF#17] +- Scan OneRowRelation[] ``` TODO: will support multiple unrelated Python UDFs in one batch (another PR). ## How was this patch tested? Added new unit tests for chained UDFs. Author: Davies Liu <davies@databricks.com> Closes apache#12014 from davies/py_udfs.
## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetitionFromGroupExpressions` is added. **Before** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)alteryx#6,(1 + a#0)alteryx#7,(A#0 + 1)alteryx#8,(1 + A#0)alteryx#9], functions=[], output=[(a + 1)alteryx#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)alteryx#6, (1 + a#0)alteryx#7, (A#0 + 1)alteryx#8, (1 + A#0)alteryx#9, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)alteryx#6,(1 + a#0) AS (1 + a#0)alteryx#7,(A#0 + 1) AS (A#0 + 1)alteryx#8,(1 + A#0) AS (1 + A#0)alteryx#9], functions=[], output=[(a#0 + 1)alteryx#6,(1 + a#0)alteryx#7,(A#0 + 1)alteryx#8,(1 + A#0)alteryx#9]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` **After** ```scala scala> sql("select a+1 from values 1,2 T(a) group by a+1, 1+a, A+1, 1+A").explain() == Physical Plan == WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1)alteryx#6], functions=[], output=[(a + 1)alteryx#5]) : +- INPUT +- Exchange hashpartitioning((a#0 + 1)alteryx#6, 200), None +- WholeStageCodegen : +- TungstenAggregate(key=[(a#0 + 1) AS (a#0 + 1)alteryx#6], functions=[], output=[(a#0 + 1)alteryx#6]) : +- INPUT +- LocalTableScan [a#0], [[1],[2]] ``` ## How was this patch tested? Pass the Jenkins tests (with a new testcase) Author: Dongjoon Hyun <dongjoon@apache.org> Closes apache#12590 from dongjoon-hyun/SPARK-14830.
@markhamstra - initial merge of 0.9 spark for review.