vec_cast.units.units() can fail if one underlying vector is integer #324

hughjonesd · 2022-09-14T10:25:07Z

Example:

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
vctrs::vec_cast_common(x, br)
Error:
! Can't convert from `..2` <double> to <integer> due to loss of precision.
• Locations: 1, 2, 3

Similarly:

x_m <- set_units(1:10, m)
br_ft <- set_units(2:6, ft)
vctrs::vec_cast_common(x_m, br_ft)
# fails with the same error message

By contrast:

x <- set_units(1:10, cm)
br_ft2 <- set_units(2:4/12, ft)
vctrs::vec_cast_common(x, br_ft2)

works. So does:

> x_in <- set_units(1:10, `in`)
> vctrs::vec_cast_common(x_in, br)

and:

x_dbl <- set_units(as.double(1:10), cm)
vctrs::vec_cast_common(x_dbl, br)

Lastly, you avoid the bug if the double happens to evaluate to integer values:

x <- set_units(1:10, cm)
br2 <- set_units(2:4 / 2.54, `in`)
vctrs::vec_cast_common(x, br2)
# works

The issue seems to be that units::vec_cast.units.units() recalls vec_cast with the underlying bare data. But if one of these is integer, and the other is double and is not integer-valued (e.g. because it's been converted), that fails.

This may make sense for reasons that I don't understand. It was surprising to me, though.

I think the issue is that vctrs correctly wants to fail if it has to cast a non-integer-valued double to an integer. But in the units case, we can be sure that the integer values are "really" doubles, i.e. they are on the real number line (... right?) If so, maybe a fix would be to add to_bare <- as.Double(to_bare) in vec_cast.units.units().

Package versions: units 0.8.0, vctrs 0.4.1, R 4.2.0.

The text was updated successfully, but these errors were encountered:

Enchufa2 · 2022-09-14T21:07:52Z

Thanks for the analysis. The question here is why are we recalling the method on the bare data. Is there any corner case I don't see in which this is required, @lionel- ?

units/R/tidyverse.R

Line 72 in 07b2606

out = vctrs::vec_cast(out_bare, to_bare, ..., x_arg = x_arg, to_arg = to_arg)

Can't the method finish 4 lines before, when the conversion was already made?

lionel- · 2022-09-15T06:35:53Z

yup it could. I thought it'd be a good thing to benefit from vctrs casting rules for bare types but maybe not?

Enchufa2 · 2022-09-15T08:39:06Z

For units, we care about the mode (numeric) but not about the storage mode. From what I can see, ptype2 is about the mode, and cast takes into account the storage mode. Is that right? So, unless there's some benefit I don't see, I'll just drop that part and adjust the tests accordingly.

lionel- · 2022-09-15T09:04:41Z

hmm... Actually it's important for vctrs that after a cast from x to y, the cast result has the same storage mode as y.

Alternatively, you could provide a proxy method for your class that always uses the same storage mode, that would work too.

Enchufa2 · 2022-09-15T10:20:21Z

Ok. The proxy thing is too much trouble I think. How about forcing the storage mode of the destination? See 85c9785

hughjonesd · 2022-09-15T11:31:12Z

Can confirm this fixes my underlying bug.

lionel- · 2022-09-15T12:22:01Z

If you have an identity proxy, the storage mode of the return value of a cast method must match the storage mode of to. So if to is not stored as a double, this cast method will cause internal errors in some cases.

Enchufa2 · 2022-09-15T12:46:34Z

Can you think of an example of such errors, so that we can use it as a test case?

lionel- · 2022-09-16T06:52:45Z

If x has integer storage and y has double storage, and the cast method doesn't normalise that, and there is an identity proxy, I believe vec_c(x, y) should fail.

Enchufa2 · 2022-09-16T07:15:41Z

It works fine with the commit above:

x = units::set_units(1:3, "cm")
y = units::set_units(4.0, "m")
storage.mode(x)
#> [1] "integer"
storage.mode(y)
#> [1] "double"
vctrs::vec_c(x, y)
#> Units: [cm]
#> [1]   1   2   3 400
vctrs::vec_c(y, x)
#> Units: [m]
#> [1] 4.00 0.01 0.02 0.03

lionel- · 2022-09-16T14:03:05Z

oh yeah, because the common type of int + dbl == dbl, and your cast method always converts to dbl.

But here is an example where it breaks:

vctrs::vec_assign(x, 2, y)
#> Error in `vctrs::vec_assign()`:
#> ! `proxy` of type `integer` incompatible with `value` proxy of type `double`.
#> ℹ In file slice-assign.c at line 156.
#> ℹ This is an internal error in the vctrs package, please report it to the package authors.
#>     ▆
#>  1. ├─vctrs::vec_assign(x, 2, y)
#>  2. └─rlang:::stop_internal_c_lib(...)
#>  3.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame) at rlang/R/utils.R:212:2

In general, not respecting the storage mode in a vctrs cast method is UB.

hughjonesd · 2023-04-23T16:11:20Z

I take it back; my bug is still there using units 0.8.1 (and vctrs 0.6.2).

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
vctrs::vec_cast_common(x, br)
Error:
! Can't convert from `..2` <double> to <integer> due to loss of precision.
• Locations: 1, 2, 3

Enchufa2 · 2023-04-23T17:45:03Z

Yes, because I never merged the commit above per Lionel's last comment, and I didn't have the time to further look into this, apologies.

hughjonesd · 2023-04-23T20:28:35Z

So is there a reason not to just convert manually to double? My thoughts, FWIW:

this error only bites the user sometimes, which makes it risky
presumably if two units are mutually convertible, they can always be thought of as both double (or both integer, maybe - I don't actually know if there are any integer-only units)

Enchufa2 · 2023-04-23T21:01:12Z

You mean at the entry point? Looking at the udunits2 interface, it works with floats only, so I suppose there's no point in supporting integers

Enchufa2 · 2023-04-23T21:24:21Z

Could you please test the commit above?

Enchufa2 · 2023-04-23T21:52:16Z

Unfortunately, this breaks a couple of (not robust enough) tests in {quantities}, so I'll have to run revdep checks.

hughjonesd · 2023-04-24T05:12:30Z

This seems to work. It fixes the bug reported here, and also the underlying bug from hughjonesd/santoku#40:

library(santoku)
x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
 [1] [ 1.00 [cm],  5.08 [cm]) [ 1.00 [cm],  5.08 [cm]) ...
Levels: [ 1.00 [cm],  5.08 [cm]) [ 5.08 [cm],  7.62 [cm]) [ 7.62 [cm], 10.16 [cm]]

When I first compiled the fix, in the same session, the above code gave me:

x <- set_units(1:10, cm)
br <- set_units(2:4, `in`)
chop(x, br)
Error in .Call("_units_ud_compare", PACKAGE = "units", x, y, xn, yn) : 
  "_units_ud_compare" not available for .Call() for package "units"

but I think that was just a temporary thing.

hughjonesd · 2023-04-24T05:21:10Z

I uncovered one other error while testing this:

x1 <- set_units(0, "degree_Celsius")
x2 <- set_units(3, "degree_Celsius")
x1 < x2

Error in ud_compare(e1, e2, as.character(units(e1)), as.character(units(e2))) : 
  ut_scale(): NULL factor argument

Enchufa2 · 2023-04-24T15:06:33Z

Interesting. New issue #346 set to track this.

* force storage mode to double at entry point, fixes #324 * add test that should fail without forcing the storage mode * bump version, update NEWS

hughjonesd · 2023-04-28T13:53:21Z

Thank you!

bart1 · 2023-08-22T20:40:55Z

@Enchufa2 I just encountered this issue as I was manually creating units (sometimes integers) that were converted to doubles. From what I gather here int units are not supported? That would be a pity as it would save a lot of memory in some cases and could make sense for for example counts.

Enchufa2 · 2023-08-23T08:51:00Z

From what I gather here int units are not supported?

Correct. Unfortunately this was causing trouble, so since v0.8-2, storage mode is force to double on entry point.

That would be a pity as it would save a lot of memory in some cases and could make sense for for example counts.

On the one hand, if you have a big(ish) data application in which you are having memory issues, then probably computing with units is too expensive too, so you should resort to specialized tools.

On the other hand, there are only a few count units, and even counts may require doubles eventually:

> set_units(set_units(1, bit), byte)
0.125 [byte]

bart1 · 2023-08-23T08:52:34Z

Thanks for your clarification @Enchufa2 I will just resort to always use doubles

edzer · 2023-08-23T09:21:59Z

For counts of things the package is not of much use, as they are treated as unitless and add to anything else unitless:

library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
set_units(3, bytes) + set_units(4, rad)
# 3.5 [bytes]

You'd like to have a package that distinguishes things, e.g. using an ontology or nomenclature or taxonomy, and that can aggregate to superclasses, e.g. 3 apples + 4 oranges = 7 pieces of fruit, or 3 European herring gulls + 2 black-headed gulls = 5 gulls, and 5 gulls + 2 chaffinches = 7 birds.

hughjonesd mentioned this issue Sep 14, 2022

Fails to extend units breaks to the left hughjonesd/santoku#40

Closed

Enchufa2 added a commit that referenced this issue Apr 23, 2023

force storage mode to double at entry point, fixes #324

f8f8c7e

Enchufa2 mentioned this issue Apr 23, 2023

force storage mode to double at entry point #344

Merged

Enchufa2 closed this as completed in #344 Apr 24, 2023

Enchufa2 added a commit that referenced this issue Apr 24, 2023

force storage mode to double at entry point (#344)

e629ce4

* force storage mode to double at entry point, fixes #324 * add test that should fail without forcing the storage mode * bump version, update NEWS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vec_cast.units.units() can fail if one underlying vector is integer #324

vec_cast.units.units() can fail if one underlying vector is integer #324

hughjonesd commented Sep 14, 2022 •

edited

Loading

Enchufa2 commented Sep 14, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

hughjonesd commented Sep 15, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

lionel- commented Sep 16, 2022

Enchufa2 commented Sep 16, 2022

lionel- commented Sep 16, 2022

hughjonesd commented Apr 23, 2023 •

edited

Loading

Enchufa2 commented Apr 23, 2023

hughjonesd commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

hughjonesd commented Apr 24, 2023

hughjonesd commented Apr 24, 2023 •

edited

Loading

Enchufa2 commented Apr 24, 2023

hughjonesd commented Apr 28, 2023

bart1 commented Aug 22, 2023

Enchufa2 commented Aug 23, 2023

bart1 commented Aug 23, 2023

edzer commented Aug 23, 2023 •

edited

Loading

vec_cast.units.units() can fail if one underlying vector is integer #324

vec_cast.units.units() can fail if one underlying vector is integer #324

Comments

hughjonesd commented Sep 14, 2022 • edited Loading

Enchufa2 commented Sep 14, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

hughjonesd commented Sep 15, 2022

lionel- commented Sep 15, 2022

Enchufa2 commented Sep 15, 2022

lionel- commented Sep 16, 2022

Enchufa2 commented Sep 16, 2022

lionel- commented Sep 16, 2022

hughjonesd commented Apr 23, 2023 • edited Loading

Enchufa2 commented Apr 23, 2023

hughjonesd commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

Enchufa2 commented Apr 23, 2023

hughjonesd commented Apr 24, 2023

hughjonesd commented Apr 24, 2023 • edited Loading

Enchufa2 commented Apr 24, 2023

hughjonesd commented Apr 28, 2023

bart1 commented Aug 22, 2023

Enchufa2 commented Aug 23, 2023

bart1 commented Aug 23, 2023

edzer commented Aug 23, 2023 • edited Loading

hughjonesd commented Sep 14, 2022 •

edited

Loading

hughjonesd commented Apr 23, 2023 •

edited

Loading

hughjonesd commented Apr 24, 2023 •

edited

Loading

edzer commented Aug 23, 2023 •

edited

Loading