From 749c79e6db1ca35eb47ba66aa4bc31c285260eae Mon Sep 17 00:00:00 2001 From: yangjie01 Date: Sat, 4 Nov 2023 01:01:21 -0700 Subject: [PATCH] [SPARK-45781][BUILD] Upgrade Arrow to 14.0.0 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ### What changes were proposed in this pull request? This pr upgrade Apache Arrow from 13.0.0 to 14.0.0. ### Why are the changes needed? The Apache Arrow 14.0.0 release brings a number of enhancements and bug fixes. ‎ In terms of bug fixes, the release addresses several critical issues that were causing failures in integration jobs with Spark([GH-36332](https://github.com/apache/arrow/issues/36332)) and problems with importing empty data arrays([GH-37056](https://github.com/apache/arrow/issues/37056)). It also optimizes the process of appending variable length vectors([GH-37829](https://github.com/apache/arrow/issues/37829)) and includes C++ libraries for MacOS AARCH 64 in Java-Jars([GH-38076](https://github.com/apache/arrow/issues/38076)). ‎ The new features and improvements focus on enhancing the handling and manipulation of data. This includes the introduction of DefaultVectorComparators for large types([GH-25659](https://github.com/apache/arrow/issues/25659)), support for extended expressions in ScannerBuilder([GH-34252](https://github.com/apache/arrow/issues/34252)), and the exposure of the VectorAppender class([GH-37246](https://github.com/apache/arrow/issues/37246)). ‎ The release also brings enhancements to the development and testing process, with the CI environment now using JDK 21([GH-36994](https://github.com/apache/arrow/issues/36994)). In addition, the release introduces vector validation consistent with C++, ensuring consistency across different languages([GH-37702](https://github.com/apache/arrow/issues/37702)). ‎ Furthermore, the usability of VarChar writers and binary writers has been improved with the addition of extra input methods([GH-37705](https://github.com/apache/arrow/issues/37705)), and VarCharWriter now supports writing from `Text` and `String`([GH-37706](https://github.com/apache/arrow/issues/37706)). The release also adds typed getters for StructVector, improving the ease of accessing data([GH-37863](https://github.com/apache/arrow/issues/37863)). The full release notes as follows: - https://arrow.apache.org/release/14.0.0.html ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #43650 from LuciferYang/arrow-14. Lead-authored-by: yangjie01 Co-authored-by: YangJie Signed-off-by: Dongjoon Hyun --- dev/deps/spark-deps-hadoop-3-hive-2.3 | 8 ++++---- pom.xml | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/dev/deps/spark-deps-hadoop-3-hive-2.3 b/dev/deps/spark-deps-hadoop-3-hive-2.3 index 6364ec48fb664..b7d6bdbfd1299 100644 --- a/dev/deps/spark-deps-hadoop-3-hive-2.3 +++ b/dev/deps/spark-deps-hadoop-3-hive-2.3 @@ -16,10 +16,10 @@ antlr4-runtime/4.13.1//antlr4-runtime-4.13.1.jar aopalliance-repackaged/2.6.1//aopalliance-repackaged-2.6.1.jar arpack/3.0.3//arpack-3.0.3.jar arpack_combined_all/0.1//arpack_combined_all-0.1.jar -arrow-format/13.0.0//arrow-format-13.0.0.jar -arrow-memory-core/13.0.0//arrow-memory-core-13.0.0.jar -arrow-memory-netty/13.0.0//arrow-memory-netty-13.0.0.jar -arrow-vector/13.0.0//arrow-vector-13.0.0.jar +arrow-format/14.0.0//arrow-format-14.0.0.jar +arrow-memory-core/14.0.0//arrow-memory-core-14.0.0.jar +arrow-memory-netty/14.0.0//arrow-memory-netty-14.0.0.jar +arrow-vector/14.0.0//arrow-vector-14.0.0.jar audience-annotations/0.5.0//audience-annotations-0.5.0.jar avro-ipc/1.11.3//avro-ipc-1.11.3.jar avro-mapred/1.11.3//avro-mapred-1.11.3.jar diff --git a/pom.xml b/pom.xml index 2e0c95516c177..cae315f4d7182 100644 --- a/pom.xml +++ b/pom.xml @@ -228,7 +228,7 @@ If you are changing Arrow version specification, please check ./python/pyspark/sql/pandas/utils.py, and ./python/setup.py too. --> - 13.0.0 + 14.0.0 2.5.11