Cannot successfully run q64.sql when the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". #460

haojinIntel · 2021-08-11T02:42:05Z

We tried to run TPC-DS with gazelle plugin and found q64.sql would fail because the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". We use this tpcds-kit(https://github.com/davies/tpcds-kit.git) to generate data and data scale is 500GB and tables are nonpartitioned. The error messages are showed below:

. . . . . . . . . . . . . . . .> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.SparkException: Job aborted due to stage failure: Task 54 in stage 1291.0 failed 4 times, most recent failure: Lost task 54.3 in stage 1291.0 (TID 108444) (bdpe-sky2 executor 30): java.lang.IllegalStateException: Value at index is null
	at org.apache.arrow.vector.BitVector.get(BitVector.java:260)
	at com.intel.oap.vectorized.ArrowWritableColumnVector$BooleanAccessor.getBoolean(ArrowWritableColumnVector.java:890)
	at com.intel.oap.vectorized.ArrowWritableColumnVector.getBoolean(ArrowWritableColumnVector.java:480)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:148)
	at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:74)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_init_0_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.init(Unknown Source)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:771)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:767)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:361)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
	at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:43)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
	at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 54 in stage 1291.0 failed 4 times, most recent failure: Lost task 54.3 in stage 1291.0 (TID 108444) (bdpe-sky2 executor 30): java.lang.IllegalStateException: Value at index is null
	at org.apache.arrow.vector.BitVector.get(BitVector.java:260)
	at com.intel.oap.vectorized.ArrowWritableColumnVector$BooleanAccessor.getBoolean(ArrowWritableColumnVector.java:890)
	at com.intel.oap.vectorized.ArrowWritableColumnVector.getBoolean(ArrowWritableColumnVector.java:480)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:148)
	at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:74)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_init_0_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.init(Unknown Source)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:771)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:767)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2253)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2202)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2201)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2201)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1078)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1078)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2440)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2382)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2371)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: java.lang.IllegalStateException: Value at index is null
	at org.apache.arrow.vector.BitVector.get(BitVector.java:260)
	at com.intel.oap.vectorized.ArrowWritableColumnVector$BooleanAccessor.getBoolean(ArrowWritableColumnVector.java:890)
	at com.intel.oap.vectorized.ArrowWritableColumnVector.getBoolean(ArrowWritableColumnVector.java:480)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.agg_doAggregateWithKeys_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage2.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:755)
	at org.apache.spark.sql.execution.joins.HashedRelation$.apply(HashedRelation.scala:148)
	at org.apache.spark.sql.execution.joins.ShuffledHashJoinExec.buildHashedRelation(ShuffledHashJoinExec.scala:74)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.wholestagecodegen_init_0_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.init(Unknown Source)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6(WholeStageCodegenExec.scala:771)
	at org.apache.spark.sql.execution.WholeStageCodegenExec.$anonfun$doExecute$6$adapted(WholeStageCodegenExec.scala:767)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:915)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:915)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:131)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745) (state=,code=0)
Closing: 0: jdbc:hive2://bdpe-sky2:10000

The text was updated successfully, but these errors were encountered:

haojinIntel · 2021-08-11T02:42:51Z

@zhouyuan @zhixingheyi-tian Please help to track the issue. Thanks!

zhouyuan · 2021-08-11T06:48:11Z

@haojinIntel

The spark.oap.sql.columnar.preferColumnar=false and spark.oap.sql.columnar.joinOptimizationLevel is designed for one special case: single stage containing many joins(e.g., q72) - in this case native sql engine is still slower than vanilla probably due to too many small memory allocations, we'll fallback to vanilla spark with better performance. The code path is complicated as user may change BHJ threshold thus different spark plans for each configuration.

the default value for spark.oap.sql.columnar.preferColumnar is true and should be able to run q64. The fallback code path is not exercised much and may be buggy - we plan to remove this codepath once we optimize the memory allocations in columnar operators.

-yuan

haojinIntel added the bug Something isn't working label Aug 11, 2021

github-actions bot mentioned this issue Aug 11, 2021

[NSE-460] fix decimal partial sum in 1.2 branch #464

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot successfully run q64.sql when the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". #460

Cannot successfully run q64.sql when the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". #460

haojinIntel commented Aug 11, 2021

haojinIntel commented Aug 11, 2021

zhouyuan commented Aug 11, 2021

Cannot successfully run q64.sql when the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". #460

Cannot successfully run q64.sql when the value of "spark.oap.sql.columnar.preferColumnar" changes from "true" to "false". #460

Comments

haojinIntel commented Aug 11, 2021

haojinIntel commented Aug 11, 2021

zhouyuan commented Aug 11, 2021