Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vec_cast.units.units() can fail if one underlying vector is integer #324

Closed
hughjonesd opened this issue Sep 14, 2022 · 25 comments · Fixed by #344
Closed

vec_cast.units.units() can fail if one underlying vector is integer #324

hughjonesd opened this issue Sep 14, 2022 · 25 comments · Fixed by #344

Comments

@hughjonesd
Copy link

hughjonesd commented Sep 14, 2022

Example:

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
vctrs::vec_cast_common(x, br)
Error:
! Can't convert from `..2` <double> to <integer> due to loss of precision.
• Locations: 1, 2, 3

Similarly:

x_m <- set_units(1:10, m)
br_ft <- set_units(2:6, ft)
vctrs::vec_cast_common(x_m, br_ft)
# fails with the same error message

By contrast:

x <- set_units(1:10, cm)
br_ft2 <- set_units(2:4/12, ft)
vctrs::vec_cast_common(x, br_ft2)

works. So does:

> x_in <- set_units(1:10, `in`)
> vctrs::vec_cast_common(x_in, br)

and:

x_dbl <- set_units(as.double(1:10), cm)
vctrs::vec_cast_common(x_dbl, br)

Lastly, you avoid the bug if the double happens to evaluate to integer values:

x <- set_units(1:10, cm)
br2 <- set_units(2:4 / 2.54, `in`)
vctrs::vec_cast_common(x, br2)
# works

The issue seems to be that units::vec_cast.units.units() recalls vec_cast with the underlying bare data. But if one of these is integer, and the other is double and is not integer-valued (e.g. because it's been converted), that fails.

This may make sense for reasons that I don't understand. It was surprising to me, though.

I think the issue is that vctrs correctly wants to fail if it has to cast a non-integer-valued double to an integer. But in the units case, we can be sure that the integer values are "really" doubles, i.e. they are on the real number line (... right?) If so, maybe a fix would be to add to_bare <- as.Double(to_bare) in vec_cast.units.units().

Package versions: units 0.8.0, vctrs 0.4.1, R 4.2.0.

@Enchufa2
Copy link
Member

Thanks for the analysis. The question here is why are we recalling the method on the bare data. Is there any corner case I don't see in which this is required, @lionel- ?

out = vctrs::vec_cast(out_bare, to_bare, ..., x_arg = x_arg, to_arg = to_arg)

Can't the method finish 4 lines before, when the conversion was already made?

@lionel-
Copy link
Contributor

lionel- commented Sep 15, 2022

yup it could. I thought it'd be a good thing to benefit from vctrs casting rules for bare types but maybe not?

@Enchufa2
Copy link
Member

For units, we care about the mode (numeric) but not about the storage mode. From what I can see, ptype2 is about the mode, and cast takes into account the storage mode. Is that right? So, unless there's some benefit I don't see, I'll just drop that part and adjust the tests accordingly.

@lionel-
Copy link
Contributor

lionel- commented Sep 15, 2022

hmm... Actually it's important for vctrs that after a cast from x to y, the cast result has the same storage mode as y.

Alternatively, you could provide a proxy method for your class that always uses the same storage mode, that would work too.

@Enchufa2
Copy link
Member

Ok. The proxy thing is too much trouble I think. How about forcing the storage mode of the destination? See 85c9785

@hughjonesd
Copy link
Author

Can confirm this fixes my underlying bug.

@lionel-
Copy link
Contributor

lionel- commented Sep 15, 2022

If you have an identity proxy, the storage mode of the return value of a cast method must match the storage mode of to. So if to is not stored as a double, this cast method will cause internal errors in some cases.

@Enchufa2
Copy link
Member

Can you think of an example of such errors, so that we can use it as a test case?

@lionel-
Copy link
Contributor

lionel- commented Sep 16, 2022

If x has integer storage and y has double storage, and the cast method doesn't normalise that, and there is an identity proxy, I believe vec_c(x, y) should fail.

@Enchufa2
Copy link
Member

It works fine with the commit above:

x = units::set_units(1:3, "cm")
y = units::set_units(4.0, "m")
storage.mode(x)
#> [1] "integer"
storage.mode(y)
#> [1] "double"
vctrs::vec_c(x, y)
#> Units: [cm]
#> [1]   1   2   3 400
vctrs::vec_c(y, x)
#> Units: [m]
#> [1] 4.00 0.01 0.02 0.03

@lionel-
Copy link
Contributor

lionel- commented Sep 16, 2022

oh yeah, because the common type of int + dbl == dbl, and your cast method always converts to dbl.

But here is an example where it breaks:

vctrs::vec_assign(x, 2, y)
#> Error in `vctrs::vec_assign()`:
#> ! `proxy` of type `integer` incompatible with `value` proxy of type `double`.
#> ℹ In file slice-assign.c at line 156.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.
#>     ▆
#>  1. ├─vctrs::vec_assign(x, 2, y)
#>  2. └─rlang:::stop_internal_c_lib(...)
#>  3.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame) at rlang/R/utils.R:212:2

In general, not respecting the storage mode in a vctrs cast method is UB.

@hughjonesd
Copy link
Author

hughjonesd commented Apr 23, 2023

I take it back; my bug is still there using units 0.8.1 (and vctrs 0.6.2).

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
vctrs::vec_cast_common(x, br)
Error:
! Can't convert from `..2` <double> to <integer> due to loss of precision.
• Locations: 1, 2, 3

@Enchufa2
Copy link
Member

Yes, because I never merged the commit above per Lionel's last comment, and I didn't have the time to further look into this, apologies.

@hughjonesd
Copy link
Author

So is there a reason not to just convert manually to double? My thoughts, FWIW:

  • this error only bites the user sometimes, which makes it risky
  • presumably if two units are mutually convertible, they can always be thought of as both double (or both integer, maybe - I don't actually know if there are any integer-only units)

@Enchufa2
Copy link
Member

You mean at the entry point? Looking at the udunits2 interface, it works with floats only, so I suppose there's no point in supporting integers

@Enchufa2
Copy link
Member

Could you please test the commit above?

@Enchufa2
Copy link
Member

Unfortunately, this breaks a couple of (not robust enough) tests in {quantities}, so I'll have to run revdep checks.

@hughjonesd
Copy link
Author

This seems to work. It fixes the bug reported here, and also the underlying bug from hughjonesd/santoku#40:

library(santoku)
x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
 [1] [ 1.00 [cm],  5.08 [cm]) [ 1.00 [cm],  5.08 [cm]) ...
Levels: [ 1.00 [cm],  5.08 [cm]) [ 5.08 [cm],  7.62 [cm]) [ 7.62 [cm], 10.16 [cm]]

When I first compiled the fix, in the same session, the above code gave me:

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
Error in .Call("_units_ud_compare", PACKAGE = "units", x, y, xn, yn) : 
  "_units_ud_compare" not available for .Call() for package "units"

but I think that was just a temporary thing.

@hughjonesd
Copy link
Author

hughjonesd commented Apr 24, 2023

I uncovered one other error while testing this:

x1 <- set_units(0, "degree_Celsius")
x2 <- set_units(3, "degree_Celsius")
x1 < x2

Error in ud_compare(e1, e2, as.character(units(e1)), as.character(units(e2))) : 
  ut_scale(): NULL factor argument

@Enchufa2
Copy link
Member

Interesting. New issue #346 set to track this.

Enchufa2 added a commit that referenced this issue Apr 24, 2023
* force storage mode to double at entry point, fixes #324

* add test that should fail without forcing the storage mode

* bump version, update NEWS
@hughjonesd
Copy link
Author

Thank you!

@bart1
Copy link
Contributor

bart1 commented Aug 22, 2023

@Enchufa2 I just encountered this issue as I was manually creating units (sometimes integers) that were converted to doubles. From what I gather here int units are not supported? That would be a pity as it would save a lot of memory in some cases and could make sense for for example counts.

@Enchufa2
Copy link
Member

From what I gather here int units are not supported?

Correct. Unfortunately this was causing trouble, so since v0.8-2, storage mode is force to double on entry point.

That would be a pity as it would save a lot of memory in some cases and could make sense for for example counts.

On the one hand, if you have a big(ish) data application in which you are having memory issues, then probably computing with units is too expensive too, so you should resort to specialized tools.

On the other hand, there are only a few count units, and even counts may require doubles eventually:

> set_units(set_units(1, bit), byte)
0.125 [byte]

@bart1
Copy link
Contributor

bart1 commented Aug 23, 2023

Thanks for your clarification @Enchufa2 I will just resort to always use doubles

@edzer
Copy link
Member

edzer commented Aug 23, 2023

For counts of things the package is not of much use, as they are treated as unitless and add to anything else unitless:

library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
set_units(3, bytes) + set_units(4, rad)
# 3.5 [bytes]

You'd like to have a package that distinguishes things, e.g. using an ontology or nomenclature or taxonomy, and that can aggregate to superclasses, e.g. 3 apples + 4 oranges = 7 pieces of fruit, or 3 European herring gulls + 2 black-headed gulls = 5 gulls, and 5 gulls + 2 chaffinches = 7 birds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants