Replies: 5 comments 2 replies
-
FWIW I think this is a great idea and I would be supportive of having this repo as a subproject of DataFusion Posting some background from https://lists.apache.org/thread/ho8krphj1qlb6l6x7ypmnjl06m8cdj96 As I understand it, the problem fn gcd(l: i64, r: i64) -> i64 {
// do gcd calculation
}
// implement vectorized version
fn eval_gcd(left: &ArrayRef, right: &ArrayRef) -> ArrayRef {
let left = left.as_primitive<Int64Type>();
let right = right.as_primitive<Int64Type>();
res = binary(left, right, |l, r| gcd(l, r));
Arc::new(res)
} The user simply annotates the scalar function and have the library code gen #[function("gcd(int64, int64) -> int64", output = "eval_gcd")]
fn gcd(l: i64, r: i64) -> i64 {
// do gcd calculation
} Another potential benefit of using a library like this is that, we |
Beta Was this translation helpful? Give feedback.
-
This is pretty awesome. If this gets added, can we have an I took a stab at it here (it is quite crude). Having an |
Beta Was this translation helpful? Give feedback.
-
@XiangpengHao and I were talking about this proposal -- another potential thing it might allow for is better taking advantage of different encodings (like DictionaryArray and StringViewArray) For example, maybe we could build basic vectorized functions with code for #[function("substr(Utf8, Int32) -> Utf8", output = "eval_substr", generate_dict=true)]
fn substr(s: &str, len: i32) -> &str {
...
}
🤔
For StringViewArray we would probably need to implement custom functions, this auto generated code might get us far |
Beta Was this translation helpful? Give feedback.
-
In case anyone else missed it, I found this blog post to be a nice introduction to creating automatically vectorized arrow-rs functions https://risingwave.com/blog/simplifying-sql-function-implementation-with-rust-procedural-macro/ |
Beta Was this translation helpful? Give feedback.
-
I filed #11413 to track the idea of trying out arrow-udf in datafusion |
Beta Was this translation helpful? Give feedback.
-
Hello, everyone.
I initiated this thread to discuss the donation of a User-Defined Function Framework. I first brought up this discussion in the Arrow community here, but received feedback that it would be more suitable for DataFusion. Therefore, I am also sharing the proposal here.
Feel free to review and leave your comments here. For live review, please visit:
https://hackmd.io/@xuanwo/apache-arrow-udf
The original content also pasted here for a quick reading:
Abstract
Arrow UDF is a User-Defined Function Framework for Apache Arrow.
Proposal
Arrow UDF allows user to easily create and run user-defined functions (UDF) in Rust, Python, Java or JavaScript based on Apache Arrow. The functions can be executed natively, or in WebAssembly, or in a remote server via Arrow Flight.
Arrow UDF was originally designed to be used by the RisingWave project but is now being used by Databend and several database startups.
We believe that the Arrow UDF project will provide diversity value to the entire Arrow community.
Background
Arrow UDF is being developed by an open-source community from day one and is owned by RisingWaveLabs. The project has been launched in December 2023.
Initial Goals
By transferring ownership of the project to the Apache Arrow, Arrow UDF expects to ensure its neutrality and further encourage and facilitate the adoption of Arrow UDF by the community.
Current Status
Contributors: 5
Users:
Documentation
The document of Arrow UDF is hosted at https://docs.rs/arrow-udf/latest/arrow_udf/.
Initial Source
The project currently holds a GitHub repository and multiple packages:
Rust:
Python:
Those packge will retain its name, while the repository will be moved to apache org.
Required Resources
Mailing Lists
We can reuse the existing mailing lists that arrow have.
Git Repositories
Option 1
Maintian all arrow udf implemantion in the same repo.
From
To
Option 2
Add Arrow UDF implementation to the corresponding
language repository.
For example:
From
To
Issue Tracking
The project would like to continue using GitHub Issues.
Other Resources
The project has already chosen GitHub actions as continuous integration tools.
Initial Committers
Beta Was this translation helpful? Give feedback.
All reactions