-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vec_cast.units.units() can fail if one underlying vector is integer #324
Comments
Thanks for the analysis. The question here is why are we recalling the method on the bare data. Is there any corner case I don't see in which this is required, @lionel- ? Line 72 in 07b2606
Can't the method finish 4 lines before, when the conversion was already made? |
yup it could. I thought it'd be a good thing to benefit from vctrs casting rules for bare types but maybe not? |
For units, we care about the mode (numeric) but not about the storage mode. From what I can see, ptype2 is about the mode, and cast takes into account the storage mode. Is that right? So, unless there's some benefit I don't see, I'll just drop that part and adjust the tests accordingly. |
hmm... Actually it's important for vctrs that after a cast from x to y, the cast result has the same storage mode as y. Alternatively, you could provide a proxy method for your class that always uses the same storage mode, that would work too. |
Ok. The proxy thing is too much trouble I think. How about forcing the storage mode of the destination? See 85c9785 |
Can confirm this fixes my underlying bug. |
If you have an identity proxy, the storage mode of the return value of a cast method must match the storage mode of |
Can you think of an example of such errors, so that we can use it as a test case? |
If |
It works fine with the commit above: x = units::set_units(1:3, "cm")
y = units::set_units(4.0, "m")
storage.mode(x)
#> [1] "integer"
storage.mode(y)
#> [1] "double"
vctrs::vec_c(x, y)
#> Units: [cm]
#> [1] 1 2 3 400
vctrs::vec_c(y, x)
#> Units: [m]
#> [1] 4.00 0.01 0.02 0.03 |
oh yeah, because the common type of int + dbl == dbl, and your cast method always converts to dbl. But here is an example where it breaks: vctrs::vec_assign(x, 2, y)
#> Error in `vctrs::vec_assign()`:
#> ! `proxy` of type `integer` incompatible with `value` proxy of type `double`.
#> ℹ In file slice-assign.c at line 156.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.
#> ▆
#> 1. ├─vctrs::vec_assign(x, 2, y)
#> 2. └─rlang:::stop_internal_c_lib(...)
#> 3. └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame) at rlang/R/utils.R:212:2 In general, not respecting the storage mode in a vctrs cast method is UB. |
I take it back; my bug is still there using units 0.8.1 (and vctrs 0.6.2). x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
vctrs::vec_cast_common(x, br)
Error:
! Can't convert from `..2` <double> to <integer> due to loss of precision.
• Locations: 1, 2, 3 |
Yes, because I never merged the commit above per Lionel's last comment, and I didn't have the time to further look into this, apologies. |
So is there a reason not to just convert manually to double? My thoughts, FWIW:
|
You mean at the entry point? Looking at the udunits2 interface, it works with floats only, so I suppose there's no point in supporting integers |
Could you please test the commit above? |
Unfortunately, this breaks a couple of (not robust enough) tests in {quantities}, so I'll have to run revdep checks. |
This seems to work. It fixes the bug reported here, and also the underlying bug from hughjonesd/santoku#40: library(santoku)
x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
[1] [ 1.00 [cm], 5.08 [cm]) [ 1.00 [cm], 5.08 [cm]) ...
Levels: [ 1.00 [cm], 5.08 [cm]) [ 5.08 [cm], 7.62 [cm]) [ 7.62 [cm], 10.16 [cm]] When I first compiled the fix, in the same session, the above code gave me: x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
Error in .Call("_units_ud_compare", PACKAGE = "units", x, y, xn, yn) :
"_units_ud_compare" not available for .Call() for package "units" but I think that was just a temporary thing. |
I uncovered one other error while testing this: x1 <- set_units(0, "degree_Celsius")
x2 <- set_units(3, "degree_Celsius")
x1 < x2
Error in ud_compare(e1, e2, as.character(units(e1)), as.character(units(e2))) :
ut_scale(): NULL factor argument |
Interesting. New issue #346 set to track this. |
* force storage mode to double at entry point, fixes #324 * add test that should fail without forcing the storage mode * bump version, update NEWS
Thank you! |
Correct. Unfortunately this was causing trouble, so since v0.8-2, storage mode is force to double on entry point.
On the one hand, if you have a big(ish) data application in which you are having memory issues, then probably computing with units is too expensive too, so you should resort to specialized tools. On the other hand, there are only a few count units, and even counts may require doubles eventually: > set_units(set_units(1, bit), byte)
0.125 [byte] |
Thanks for your clarification @Enchufa2 I will just resort to always use doubles |
For counts of things the package is not of much use, as they are treated as unitless and add to anything else unitless: library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
set_units(3, bytes) + set_units(4, rad)
# 3.5 [bytes] You'd like to have a package that distinguishes things, e.g. using an ontology or nomenclature or taxonomy, and that can aggregate to superclasses, e.g. 3 apples + 4 oranges = 7 pieces of fruit, or 3 European herring gulls + 2 black-headed gulls = 5 gulls, and 5 gulls + 2 chaffinches = 7 birds. |
Example:
Similarly:
By contrast:
works. So does:
and:
Lastly, you avoid the bug if the double happens to evaluate to integer values:
The issue seems to be that
units::vec_cast.units.units()
recallsvec_cast
with the underlying bare data. But if one of these is integer, and the other is double and is not integer-valued (e.g. because it's been converted), that fails.This may make sense for reasons that I don't understand. It was surprising to me, though.
I think the issue is that
vctrs
correctly wants to fail if it has to cast a non-integer-valued double to an integer. But in the units case, we can be sure that the integer values are "really" doubles, i.e. they are on the real number line (... right?) If so, maybe a fix would be to addto_bare <- as.Double(to_bare)
invec_cast.units.units()
.Package versions: units 0.8.0, vctrs 0.4.1, R 4.2.0.
The text was updated successfully, but these errors were encountered: