Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-382]Fix Hadoop version issue #388

Merged
merged 3 commits into from
Jul 5, 2021

Conversation

weiting-chen
Copy link
Collaborator

@weiting-chen weiting-chen commented Jul 1, 2021

What changes were proposed in this pull request?

  1. Add hadoop.version for hadoop client in arrow-data-source and native-sql-engine
  2. Add profile setting for hadoop version control, default use hadoop 2.7.4
  3. Fixing one issue for the variable: hadoop.version does not apply to hadoop-client library in compiler phase before.
  4. Move the order of profile section to higher for better reading

How was this patch tested?

Tested in local server.

@github-actions
Copy link

github-actions bot commented Jul 1, 2021

#382

@zhouyuan
Copy link
Collaborator

zhouyuan commented Jul 1, 2021

the failure seems due to ${hadoop.version} is not passed from parent pom if compile core only
"mvn package -pl native-sql-engine/core"

<properties>
<scala.version>2.12.10</scala.version>
<scala.binary.version>2.12</scala.binary.version>
<spark.version>3.1.1</spark.version>
<arrow.version>4.0.0</arrow.version>
<arrow-memory.artifact>arrow-memory-netty</arrow-memory.artifact>
<hadoop.version>2.7.4</hadoop.version>
<hadoop.version>${hadoop.version}</hadoop.version>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks a bit odd to me, this should be the same as original way?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the same, the issue I found is ${hadoop.version} in arrow datasource and columnar plugin didn't catch the value from parent.
The solution can help to deliver ${hadoop.version} to both sub-projects.
My testing is to use mvn help:evaluate to confirm the difference.
The original way shows ${hadoop.version} as 2.7.4 when enabling -Phadoop-3.2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried the same cmd here, my hadoop version is correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add the maven version limit here?
https://github.com/apache/spark/blob/master/pom.xml#L118

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do this by using the same maven plugin.
Let me add it and submit a new PR.

@weiting-chen
Copy link
Collaborator Author

the failure seems due to ${hadoop.version} is not passed from parent pom if compile core only
"mvn package -pl native-sql-engine/core"

I can reproduce the error when adding -Pfull-scala-compiler, if remove the parameter, the compile can run successfully.
I am investigating the issue.

@weiting-chen
Copy link
Collaborator Author

For NullPointer issue,
The root cause is because of activeByDefault usage.
Based on the document,
https://maven.apache.org/guides/introduction/introduction-to-profiles.html
"This profile will automatically be active for all builds unless another profile in the same POM is activated using one of the previously described methods."

There is a potential issue for arrow-memory.artifact setting and we fix it in the PR as well.

@zhouyuan zhouyuan merged commit e39bff6 into oap-project:master Jul 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants