Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hive external table of OCI object storage , run select count(*) return error : No FileSystem for scheme "oci" #50

Closed
zhengwanbo opened this issue Jul 1, 2021 · 4 comments

Comments

@zhengwanbo
Copy link

Hadoop version:
1、HDP 3.1.4.0-315
2、Hive 3.1.0
3、hdfs-connector: oci-hdfs-full-3.3.0.7.0.1

logs:
0: jdbc:hive2://bigdata-hadoop-2.sub070606371> show create table ssb_customer_txt_obj;
DEBUG : Acquired the compile lock.
INFO : Compiling command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d): show create table ssb_customer_txt_obj
DEBUG : Encoding valid txns info 2040:9223372036854775807:: txnid:2040
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:createtab_stmt, type:string, comment:from deserializer)], properties:null)
INFO : Completed compiling command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d); Time taken: 3.461 seconds
INFO : Executing command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d): show create table ssb_customer_txt_obj
INFO : Starting task [Stage-0:DDL] in serial mode
DEBUG : Task getting executed using mapred tag : hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d,userid=root
INFO : Completed executing command(queryId=hive_20210701170404_946bf166-b352-47aa-98a5-4fd45a04e09d); Time taken: 1.54 seconds
INFO : OK
DEBUG : Shutting down query show create table ssb_customer_txt_obj
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE ssb_customer_txt_obj( |
| c_custkey int, |
| c_name string, |
| c_address string, |
| c_city string, |
| c_nation string, |
| c_region string, |
| c_phone string, |
| c_mktsegment string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='|', |
| 'serialization.format'='|') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'oci://bigdata@ocichina001/ssb100_data/customer' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'discover.partitions'='true', |
| 'transient_lastDdlTime'='1625018554') |
+----------------------------------------------------+
24 rows selected (5.365 seconds)

0: jdbc:hive2://bigdata-hadoop-2.sub070606371>
0: jdbc:hive2://bigdata-hadoop-2.sub070606371> select * from ssb_customer_txt_obj limit 2;
DEBUG : Acquired the compile lock.
INFO : Compiling command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f): select * from ssb_customer_txt_obj limit 2
DEBUG : Encoding valid txns info 2042:9223372036854775807:: txnid:2042
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:ssb_customer_txt_obj.c_custkey, type:int, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_name, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_address, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_city, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_nation, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_region, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_phone, type:string, comment:null), FieldSchema(name:ssb_customer_txt_obj.c_mktsegment, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f); Time taken: 6.69 seconds
INFO : Executing command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f): select * from ssb_customer_txt_obj limit 2
INFO : Completed executing command(queryId=hive_20210701171017_9a358f1f-1c0c-43d0-ad3c-1a55777e821f); Time taken: 0.342 seconds
INFO : OK
DEBUG : Shutting down query select * from ssb_customer_txt_obj limit 2
+---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+
| ssb_customer_txt_obj.c_custkey | ssb_customer_txt_obj.c_name | ssb_customer_txt_obj.c_address | ssb_customer_txt_obj.c_city | ssb_customer_txt_obj.c_nation | ssb_customer_txt_obj.c_region | ssb_customer_txt_obj.c_phone | ssb_customer_txt_obj.c_mktsegment |
+---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+
| 1 | Customer#000000001 | j5JsirBM9P | MOROCCO 0 | MOROCCO | AFRICA | 25-989-741-2988 | BUILDING |
| 2 | Customer#000000002 | 487LW1dovn6Q4dMVym | JORDAN 1 | JORDAN | MIDDLE EAST | 23-768-687-3665 | AUTOMOBILE |
+---------------------------------+------------------------------+---------------------------------+------------------------------+--------------------------------+--------------------------------+-------------------------------+------------------------------------+
2 rows selected (14.117 seconds)

0: jdbc:hive2://bigdata-hadoop-2.sub070606371> select count() from ssb_customer_txt_obj;
DEBUG : Acquired the compile lock.
INFO : Compiling command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7): select count(
) from ssb_customer_txt_obj
DEBUG : Encoding valid txns info 2041:9223372036854775807:: txnid:2041
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7); Time taken: 16.057 seconds
INFO : Executing command(queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7): select count() from ssb_customer_txt_obj
INFO : Query ID = hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
DEBUG : Task getting executed using mapred tag : hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7,userid=root
INFO : Subscribed to counters: [] for queryId: hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7
INFO : Tez session hasn't been created yet. Opening session
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/orai18n.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_cert.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/oraclepki.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/xdb.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/hive-hcatalog-core.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_jce.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/ucp.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/ojdbc8.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osdt_core.jar"
DEBUG : Adding local resource: scheme: "hdfs" host: "bigdata-hadoop-1.sub07060637140.bddsvcn.oraclevcn.com" port: 8020 file: "/data/tmp/hive/root/_tez_session_dir/d7333501-dddc-4398-bb21-9cd0f3f7d777-resources/osh.jar"
INFO : Dag name: select count(
) from ssb_customer_txt_obj (Stage-1)
DEBUG : DagInfo: {"context":"Hive","description":"select count() from ssb_customer_txt_obj"}
DEBUG : Setting Tez DAG access for queryId=hive_20210701170416_14defefe-bc86-465d-8a47-b5f16557c5a7 with viewAclString=
, modifyStr=root,hive
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO : org.apache.tez.common.counters.DAGCounter:
INFO : AM_CPU_MILLISECONDS: 2640
INFO : AM_GC_TIME_MILLIS: 124
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268)

    VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
DEBUG : Shutting down query select count(*) from ssb_customer_txt_obj

    VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED

Map 1 container INITIALIZING -1 0 0 -1 0 0
Reducer 2 container INITED 1 0 0 1 0 0

VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 6.86 s

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1625031757474_0001_1_00, diagnostics=[Vertex vertex_1625031757474_0001_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: ssb_customer_txt_obj initializer failed, vertex=vertex_1625031757474_0001_1_00 [Map 1], org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "oci"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3281)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3301)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:361)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:268)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:239)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:325)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:781)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1625031757474_0001_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1625031757474_0001_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)

@jodoglevy
Copy link
Member

@zhengwanbo thanks for filing this issue - we'll take a look and get back to you

@omkar07
Copy link
Member

omkar07 commented Jul 16, 2021

hi @zhengwanbo, It seems you are not able to run count query but it has nothing to do with hdfs connector. You will need to do the following:

  1. Reference the JAR file before starting the Spark shell i.e by placing hdfs connector lib and third-party jars in spark-3.1.2-bin-hadoop3.2/jars
  2. Create .oci folder and copy your api keys. Also, create core-site.xml and place it spark-3.1.2-bin-hadoop3.2/conf folder along with spark-defaults.conf file. follow: https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/hdfsconnectorspark.htm
  3. Run command set hive.compute.query.using.stats=false; on Beeline-Hive. This way, Hive would perform count on the basis of data present in HDFS through a MapReduce job.

I was able to setup and run the count query by following above steps. Below is screenshot. Please let me know if you any questions.
Screen Shot 2021-07-16 at 4 05 59 PM

@jodoglevy
Copy link
Member

@zhengwanbo - checking in here since we haven't heard back from you in a while. Did @omkar07's response resolve your issue?

@omkar07
Copy link
Member

omkar07 commented Aug 11, 2021

Hi @zhengwanbo - since we haven't heard back from you, we're resolving this issue. But feel free to reopen this if you are still experiencing problems.

@omkar07 omkar07 closed this as completed Aug 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants