Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Move some utility methods to submodules of scalar_funcs #590

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

advancedxy
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

This is a follow-up as discussed in #449 (comment)

What changes are included in this PR?

  1. hex_encode and wrap_digest_result_as_hex_string goes to hex submodule
  2. spark_murmur3_hash and spark_xxhash64 goes to hash_expressions submodule
  3. update benchmark and modify test code

How are these changes tested?

Existing tests with one slightly modification.

@advancedxy
Copy link
Contributor Author

@andygrove, @tshauck and @comphead would you mind to take you at this once CI passes?

}
}

pub fn spark_xxhash64(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not this PR problem but we need a description to pub methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we need a description to pub methods

I can add a description here.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Yeah, I originally limited it to pub(super). However we are accessing it in the benchmark module which needs public interface. I think we can leave it as it is and address access scope later by rewriting the benchmark code.

match seed {
ColumnarValue::Scalar(ScalarValue::Int32(Some(seed))) => {
// iterate over the arguments to find out the length of the array
let num_rows = args[0..args.len() - 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance here to be an index out of bounds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The seed is always provided in the Spark/JVM side.

Copy link
Contributor

@tshauck tshauck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others probably have more substantive feedback, but in terms of a good step for better organization, this LGTM

}
}

pub fn spark_xxhash64(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, might consider limiting the scope. This function probably isn't needed outside of scalar_funcs.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. One minor question

core/benches/hash.rs Show resolved Hide resolved
@advancedxy
Copy link
Contributor Author

Gently ping @comphead @andygrove and @kazuyukitanimura

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @advancedxy

@viirya viirya merged commit 2992a5e into apache:main Jun 25, 2024
66 checks passed
@viirya
Copy link
Member

viirya commented Jun 25, 2024

Merged. Thanks @advancedxy @kazuyukitanimura @tshauck @comphead

@advancedxy
Copy link
Contributor Author

Thanks everyone for reviewing.

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
…e#590)

* chore: Move some utility methods to submodules of scalar_funcs

* Address comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants