Skip to content

Conversation

vrozov
Copy link
Member

@vrozov vrozov commented Mar 7, 2025

What changes were proposed in this pull request?

Upgrade Hive compile time dependency to 4.0.1

Why are the changes needed?

Apache Hive 1.x, 2.x and 3.x are EOL

Does this PR introduce any user-facing change?

Yes, more details to come.

How was this patch tested?

WIP

Was this patch authored or co-authored using generative AI tooling?

No

// Since HIVE-18238(Hive 3.0.0), the Driver.close function's return type changed
// and the CommandProcessorFactory.clean function removed.
driver.getClass.getMethod("close").invoke(driver)
if (version != hive.v3_0 && version != hive.v3_1 && version != hive.v4_0) {
Copy link
Member

@simhadri-g simhadri-g Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would break backward compatibility when spark tries to connect with older versions of Hive right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a connection problem. There will be a problem when 2.x Hive jars are provided at runtime. It is not clear to me if it is still necessary to support such option granted that 2.x is EOL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.x is ancient and should no longer be supported

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.
Hive 2.x and 3.x is EoL. We should move completely to hive 4+.

Thanks!

.map { case (k, v) =>
if (v == "NULL") {
s"$k=${ConfVars.DEFAULTPARTITIONNAME.defaultStrVal}"
s"$k=${ConfVars.DEFAULTPARTITIONNAME.getDefaultVal}"
Copy link
Member

@deniskuzZ deniskuzZ Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should use org.apache.hadoop.hive.conf.HiveConf.ConfVars.DEFAULT_PARTITION_NAME

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What potential breaking or behavioral changes could this PR introduce? We need to carefully evaluate each change individually.

@razvan
Copy link

razvan commented Jun 18, 2025

Hi, thank you for your work.

Will this PR also address the CVE-2019-10202 as described in SPARK-30466 ?

@vrozov
Copy link
Member Author

vrozov commented Jun 20, 2025

@razvan SPARK-30466 does not seem to affect Spark 4.x and this PR targets 4.x only. If you have more questions, I'd suggest to bring them to dev list and comment on the SPIP or SPARK-52408

@vrozov vrozov force-pushed the SPARK-51348 branch 2 times, most recently from 514bd5f to d280720 Compare August 4, 2025 22:40
@vrozov vrozov changed the title [WIP][SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 Aug 8, 2025
@vrozov vrozov changed the title [SPARK-51348][BUILD][SQL] Upgrade Hive to 4.0 [SPARK-51348][BUILD][SQL] Upgrade Hive to 4.1 Aug 8, 2025
@vrozov vrozov marked this pull request as ready for review August 8, 2025 14:46
@vrozov vrozov closed this Aug 21, 2025
@vrozov vrozov deleted the SPARK-51348 branch August 21, 2025 20:25
@vrozov
Copy link
Member Author

vrozov commented Aug 22, 2025

Please see #52099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants