Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support cast for decimal data type #1043

Closed
7 of 8 tasks
liukun4515 opened this issue Dec 14, 2021 · 18 comments
Closed
7 of 8 tasks

support cast for decimal data type #1043

liukun4515 opened this issue Dec 14, 2021 · 18 comments
Labels
enhancement Any new improvement worthy of a entry in the changelog

Comments

@liukun4515
Copy link
Contributor

liukun4515 commented Dec 14, 2021

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In the apache/datafusion#122 (comment), we need to support decimal for cast and try_cast

In the first step, we just support signed numeric data type to decimal, and other casting rules related to decimal will be supported in the feature.

future

enhancement

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@liukun4515 liukun4515 added the enhancement Any new improvement worthy of a entry in the changelog label Dec 14, 2021
@liukun4515 liukun4515 changed the title support signed numeric to decimal data type support numeric to decimal data type Dec 14, 2021
@liukun4515
Copy link
Contributor Author

please assign this issue to me.
Thanks

@alamb
Copy link
Contributor

alamb commented Dec 14, 2021

unsigned numeric to decimal @alamb do we need to support this? My opinion is that unsigned data type is not in the standard SQL data type, and we no need to support this.

@liukun4515 I personally think it would be valuable to support unsigned numeric (aka PrimitiveArray<T>) to / from the decimal type. Arrow is used for more than just SQL engines. However, I think it is fine to leave it as a follow on PR / issue (doesn't have to be in the first implementation).

I suspect once you implement PrimitiveArray<T> to/from DecimalArray for signed variants the incremental cost to supporting the unsigned numeric variants will be quite low.

@liukun4515
Copy link
Contributor Author

Ok, I will implement signed numeric first, and leave other as the follow-up PR.

@chadbrewbaker
Copy link

On commit ab48e69 building with nightly-2021-12-05-aarch64-apple-darwin I am getting:

---- record::api::tests::test_convert_double_to_string stdout ----
thread 'record::api::tests::test_convert_double_to_string' panicked at 'assertion failed: `(left == right)`
  left: `"1e-15"`,
 right: `"0.000000000000001"`', parquet/src/record/api.rs:1097:9

---- record::api::tests::test_convert_float_to_string stdout ----
thread 'record::api::tests::test_convert_float_to_string' panicked at 'assertion failed: `(left == right)`
  left: `"1e-15"`,
 right: `"0.000000000000001"`', parquet/src/record/api.rs:1084:9

The string formatter is:

Field::Float(value) => {
if !(1e-15..=1e19).contains(&value) {
write!(f, "{:E}", value)
} else {
write!(f, "{:?}", value)
}
}
Field::Double(value) => {
if !(1e-15..=1e19).contains(&value) {
write!(f, "{:E}", value)
} else {
write!(f, "{:?}", value)
}

Where are the requirements for this?

@alamb
Copy link
Contributor

alamb commented Dec 15, 2021

Where are the requirements for this?

I do not know @chadbrewbaker

Some code archeology in https://github.com/apache/arrow-rs/blame/ab48e69099f2d7c4178a75f7e1cf8b223ecf8d76/parquet/src/record/api.rs#L717 suggests that there has been a special case for checking that value for quite a while

Given that 0.000000000000001 is right on the boundary, perhaps we need to adjust the code to reflect differences in floating point on aarch64?

@chadbrewbaker
Copy link

Same errors on x86.

This seems to be the last commit that changed the rounding.

21cb780

@alamb
Copy link
Contributor

alamb commented Dec 15, 2021

@chadbrewbaker On a mac, the tests pass for me using cargo test -p parquet (using stable)

$ cargo --version
cargo 1.57.0 (b2e52d7ca 2021-10-21)

I also get the diff using nightly:

(arrow_dev) alamb@MacBook-Pro-2:~/Software/arrow-rs$ cargo +nightly --version
cargo 1.58.0-nightly (294967c53 2021-11-29)

If you want to propose a patch to allow the tests to pass on both nightly and stable that would be 👍

@chadbrewbaker
Copy link

Bah - my stupidity not running stable. I'll diff what they changed in nightly. Seems to be cross-platform.

@chadbrewbaker
Copy link

I'll try to upstream our test for the "1e-15" boundary. Doesn't seem to be one now.

https://github.com/rust-lang/rust/blob/c5ecc157043ba413568b09292001a4a74b541a4e/library/alloc/tests/fmt.rs#L137-L155

@chadbrewbaker
Copy link

TIL

cargo bisect-rustc --start 2021-10-01  -- cargo test test_convert_double_to_string

@chadbrewbaker
Copy link

I bisected the 1e-15 conversion regression down to this changeset between nightlies.

rust-lang/rust@1af55d1...efd0483

@Urgau
Copy link

Urgau commented Dec 16, 2021

I bisected the 1e-15 conversion regression down to this change-set between nightlies.

rust-lang/rust@1af55d1...efd0483

From your range of commit you can see that there is a commit named Automatic exponential formatting in Debug that added an internal logic that switch to the exponential representation in some cases when using the Debug formatter.
Rust make no promises about Debug formatting and can change it's output anytime for any reason but this is not the case for the Display formatter which has guaranties for how it display thing.

I would suggest changing your formatting to the Display (ie write!(f, "{:?}", value) -> write!(f, "{}", value)) formatter.

@liukun4515
Copy link
Contributor Author

@alamb we need to speed up reviewing these two pull requests #1044 #1073, and they can be merged into the release of arrow-rs 6.5.
They will be used in the datafusion for comparison operation and mathematics operation for decimal type.

@alamb
Copy link
Contributor

alamb commented Dec 21, 2021

@liukun4515 I will do so today

@liukun4515 liukun4515 changed the title support numeric to decimal data type support cast for decimal data type Dec 22, 2021
@liukun4515
Copy link
Contributor Author

There are bugs for cast decimal to decimal.

If we cast decimal(1256,12,2) to the type of decimal(12,1), the result should be decimal(126,12,1) not the decimal(125,12,1)

@alamb
Copy link
Contributor

alamb commented Nov 14, 2022

👍 thanks @liukun4515 -- maybe it is worth its own ticket to talk about the rounding vs truncation behavior of decimals while casting

@liukun4515 liukun4515 removed their assignment Dec 15, 2022
@chadbrewbaker
Copy link

Nice! I will look at the commit. For ML workloads, also going to need a version that stochastically flips the last bit. https://twitter.com/id_aa_carmack/status/1296295284994183168

@alamb
Copy link
Contributor

alamb commented Dec 15, 2022

Possibly related #3281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

No branches or pull requests

4 participants