Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat/11953: Support StringView for TRANSLATE() fn #11967

Merged
merged 9 commits into from
Aug 15, 2024

Conversation

devanbenz
Copy link
Contributor

Which issue does this PR close?

Closes #11953

Rationale for this change

This will add StringView support for the TRANSLATE() function as part of the work to add complete StringView support in DataFusion, which permits potentially much faster processing of string data. See: #10918 for more background.

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Signed-off-by: Devan <devandbenz@gmail.com>
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions labels Aug 13, 2024
@devanbenz devanbenz marked this pull request as ready for review August 13, 2024 15:19
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @devanbenz

I tried this function out and it seems to get an internal error with now when trying to run with a StringArray column

DataFusion CLI v41.0.0
> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.027 seconds.

> SELECT  TRANSLATE(column1, 'foo', 'bar') from foo;
Internal error: could not cast value to arrow_array::array::byte_array::GenericByteArray<arrow_array::types::GenericStringType<i32>>.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

@@ -72,6 +75,7 @@ impl ScalarUDFImpl for TranslateFunc {

fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
match args[0].data_type() {
DataType::Utf8View => make_scalar_function(translate::<i32>, vec![])(args),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect this I think you'll likely also need to change the implementation of translate as well (see for example what @PsiACE has done in #11970)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good I'll take a look 👍

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@devanbenz
Copy link
Contributor Author

@alamb I still need to implement the test case but the cast error should be gone now. I tried it out in the CLI (oops should have done that before! 😅) thanks for taking a look :)

> create table foo as values (arrow_cast('foo', 'Utf8View'));
0 row(s) fetched.
Elapsed 0.026 seconds.

> SELECT  TRANSLATE(column1, 'foo', 'bar') from foo;
+------------------------------------------------+
| translate(foo.column1,Utf8("foo"),Utf8("bar")) |
+------------------------------------------------+
| foo                                            |
+------------------------------------------------+
1 row(s) fetched.
Elapsed 0.011 seconds.

Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
@devanbenz devanbenz requested a review from alamb August 13, 2024 20:34
Signed-off-by: Devan <devandbenz@gmail.com>
Signed-off-by: Devan <devandbenz@gmail.com>
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful @devanbenz -- thank you 🙏

@alamb alamb merged commit 6b73c4f into apache:main Aug 15, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update TRANSLATE scalar function to support Utf8View
2 participants