Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow::compute::kernels::temporal should support nanoseconds #2996

Merged
merged 4 commits into from
Nov 3, 2022

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Nov 1, 2022

Which issue does this PR close?

Closes #2995.

Rationale for this change

arrow::compute::kernels::temporal should support nanoseconds to let calculate nanos, millis, micros

What changes are included in this PR?

Support nano seconds for temporal

Are there any user-facing changes?

Added public function nanosecond

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 1, 2022
@comphead
Copy link
Contributor Author

comphead commented Nov 1, 2022

@waitingkuo @tustvold please review the PR whenever you have time.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments, I think ideally we would avoid making a breaking API change as part of this

}

/// Extracts the seconds of a given temporal array as an array of integers
pub fn second_generic<T, A: ArrayAccessor<Item = T::Native>>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have lost this function?

}

/// Extracts the seconds fraction of a given temporal array as an array of integers
pub fn second_fraction_generic<T, A: ArrayAccessor<Item = T::Native>, F>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could call this something like unary_timestamp or something? I would also not make it public, so we can alter it down the line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest that we could either

  1. separate them (aka second_generic, and nano_generic), keep it consistent wither other operator
  2. make it more general, combines not only second, nanosecond. (e.g. second_fraction_generic::<T, _, _>(array, "minute", |t| t.minute() as i32 should work for minute)

/// Extracts the seconds of a given temporal array as an array of integers
fn second_internal<T, A: ArrayAccessor<Item = T::Native>>(
/// Extracts the seconds fraction of a given temporal array as an array of integers
fn second_fraction_internal<T, A: ArrayAccessor<Item = T::Native>, F>(
Copy link
Contributor

@tustvold tustvold Nov 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that this function can probably be implemented more efficiently using try_unary, probably not something for this PR though

Edit: It actually can't because of the use of ArrayAccessor 😢

@comphead
Copy link
Contributor Author

comphead commented Nov 2, 2022

Thanks @tustvold for the feedback.

There is no breaking change, I have made second_generic and second_internal functions more generic so they can serve for both nanosecond and second. Second_generic and second_internal have been renamed to second_fraction_generic and second_fraction_internal respectively. I agree about better naming, appreciate if you can provide the name :)

Comment on lines +1334 to 1349

#[test]
fn test_temporal_array_date64_nanosecond() {
// new Date(1667328721453)
// Tue Nov 01 2022 11:52:01 GMT-0700 (Pacific Daylight Time)
//
// new Date(1667328721453).getMilliseconds()
// 453

let a: PrimitiveArray<Date64Type> = vec![None, Some(1667328721453)].into();

let b = nanosecond(&a).unwrap();
assert!(!b.is_valid(0));
assert_eq!(453_000_000, b.value(1));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
i think we could add another test case for test nanosecond() for Timestamp data type like TimestampSecondArray

@comphead
Copy link
Contributor Author

comphead commented Nov 2, 2022

@waitingkuo thanks for your feedback. Renamed functions to time_fraction_generic, now it also works for minutes. Ideally it can work for hour, but hour by some reason has more conditions than minute, second, nanosecond. I can check hour in separate PR

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to now remove the public second_generic and minute_generic functions. This makes it a breaking API change as downstreams using these methods will no longer work without modification. Perhaps we could rollback this change?

I would like to remove these _generic kernels, and replace them with dyn variants like we have for other kernels, but I think we should do this as a separate PR for all of them - I will create a ticket flr this

Edit: created #3004 perhaps you might like to work on this?

}

/// Extracts the time fraction of a given temporal array as an array of integers
pub fn time_fraction_generic<T, A: ArrayAccessor<Item = T::Native>, F>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made private. Returned public second_generic and minute_generic methods as well as tests for them.

@tustvold
Copy link
Contributor

tustvold commented Nov 2, 2022

Thank you

@waitingkuo
Copy link
Contributor

LGTM thank you @comphead @tustvold

@tustvold tustvold merged commit a2c6647 into apache:master Nov 3, 2022
@ursabot
Copy link

ursabot commented Nov 3, 2022

Benchmark runs are scheduled for baseline = b1050b7 and contender = a2c6647. a2c6647 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

arrow::compute::kernels::temporal should support nanoseconds
4 participants