Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc #382

Closed
HongW2019 opened this issue Jun 28, 2021 · 10 comments · Fixed by #392
Closed
Labels
bug Something isn't working

Comments

@HongW2019
Copy link
Contributor

Describe the bug
To support gazelle_plugin on Google Cloud Dataproc, we run spark-sql with gazelle_plugin v1.1.1-spark-3.1.1 enabled on
Google Cloud Dataproc 2.0 (CentOS 8, Hadoop 3.2.2, Spark 3.1.1), then met similar issues with EMR #368
image

image

@HongW2019 HongW2019 added the bug Something isn't working label Jun 28, 2021
@zhouyuan
Copy link
Collaborator

@HongW2019 i think the issue may be related with the hadoop version we used:
https://github.com/oap-project/gazelle_plugin/blob/master/pom.xml#L211

can you please try to build with 3.2.2?

@HongW2019
Copy link
Contributor Author

@HongW2019 i think the issue may be related with the hadoop version we used:
https://github.com/oap-project/gazelle_plugin/blob/master/pom.xml#L211

can you please try to build with 3.2.2?

@zhouyuan Sure. After we built it with 3.2.2, met with the same exception with EMR.

image

@HongW2019
Copy link
Contributor Author

@HongW2019 i think the issue may be related with the hadoop version we used:
https://github.com/oap-project/gazelle_plugin/blob/master/pom.xml#L211
can you please try to build with 3.2.2?

@zhouyuan Sure. After we built it with 3.2.2, met with the same exception with EMR.

image

@weiting-chen This result came from our re-built jar with 3.2.2, is this issue still related to Hadoop version conflict? wonder whether I need to re-produce it with your built jar with 3.2.2 if it is convenient.

@HongW2019 HongW2019 changed the title Support to use gazelle_plugin on Google Cloud Dataproc Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc Jun 30, 2021
@zhixingheyi-tian zhixingheyi-tian changed the title Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0 Jun 30, 2021
@zhixingheyi-tian zhixingheyi-tian changed the title Encountered Hadoop version (3.2.1) conflict issue on AWS EMR-6.3.0 Hadoop version conflict when supporting to use gazelle_plugin on Google Cloud Dataproc Jun 30, 2021
@HongW2019
Copy link
Contributor Author

HongW2019 commented Jul 1, 2021

Modify Hadoop version at properties then re-build jar would avoid Hadoop version conflict.
But if modify the version at profile then re-build with -Phadoop-3.2 then this jar will still bring exception.

@zhouyuan
Copy link
Collaborator

zhouyuan commented Jul 2, 2021

@HongW2019 can you please share the mvn info?
mvn -v should output the necessary info

@HongW2019
Copy link
Contributor Author

mvn -v

sure @zhouyuan

(oapbuild) [root@bdpe-sky4 ~]# mvn -v
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /opt/Beaver/maven
Java version: 1.8.0_112, vendor: Oracle Corporation
Java home: /opt/Beaver/jdk-8u112/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-1160.24.1.el7.x86_64", arch: "amd64", family: "unix"

@zhouyuan
Copy link
Collaborator

zhouyuan commented Jul 2, 2021

@HongW2019 seems quite old, could you please try to build the package with recent maven? I suspect this is due to the old maven which may not working well on reading properties from profile
here's the link for maven downloading
https://maven.apache.org/download.cgi

@HongW2019
Copy link
Contributor Author

@HongW2019 seems quite old, could you please try to build the package with recent maven? I suspect this is due to the old maven which may not working well on reading properties from profile
here's the link for maven downloading
https://maven.apache.org/download.cgi

@zhouyuan After re-build jar with profile -Phadoop-3.2 with maven 3.6.3 instead of 3.3.9, run spark-shell with this jar found previous hadoop exception disappear.

@HongW2019
Copy link
Contributor Author

After config Gazelle Plugin 1.2 on Dataproc 2.0 (CentOS 8, Hadoop 3.2.2, Spark 3.1.1) rightly, TPC-DS 99 queries have been successfully passed.

image

image

image

@HongW2019 HongW2019 reopened this Jul 5, 2021
@HongW2019
Copy link
Contributor Author

@zhouyuan @weiting-chen FYI. I think we can close this issue now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants