Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly export ud_convert function? #266

Closed
ateucher opened this issue Feb 16, 2021 · 9 comments
Closed

Possibly export ud_convert function? #266

ateucher opened this issue Feb 16, 2021 · 9 comments

Comments

@ateucher
Copy link

I'm wondering if it's possible to export the currently unexported functionud_convert?

Use case: I have a package for working with water quality data - about 15 million rows. It contains columns of data of mixed units (unit stored in a column), many of which need to be converted to another unit (stored in a different column). Because they're mixed, I have to vapply or loop over the rows, thus calling set_units(set_units(x, unitA), unitB) many times. Calling ud_convert() directly is much faster:

library(units)
#> udunits system database from /usr/local/share/udunits
library(microbenchmark)

microbenchmark(
  units:::ud_convert(1, "mg/L", "ug/L"), 
  set_units(set_units(1, "mg/L"), "ug/L")
)
#> Unit: microseconds
#>                                     expr      min        lq       mean   median
#>    units:::ud_convert(1, "mg/L", "ug/L")   26.051   41.6465   63.16936   59.328
#>  set_units(set_units(1, "mg/L"), "ug/L") 2078.241 3441.7690 4155.80035 4327.640
#>         uq      max neval
#>    76.1755   134.01   100
#>  4775.0780 10895.10   100

A real-world example with a 2000-row sample of data:

library(units)
csv_file <- "https://gist.githubusercontent.com/ateucher/ec2caf91108639b8126a1b0349bee5a3/raw/28f7d02c41e0d1436da46f7319aa6a57065cdb1c/test.csv"

x <- read.csv(csv_file, stringsAsFactors = FALSE)

# Various combinations of units:
table(x$from_unit, x$to_unit)
#>           
#>            CFU/100mL g/m2    m m3/d mg/L  t/d ueq/L ug/g ug/m3
#>   %                0    0    0    0    2    0     0    2     0
#>   E3m3/d           0    0    0   48    0    0     0    0     0
#>   kg/d             0    0    0    0    0    7     0    0     0
#>   L/min           31    0    0    0    0    0     0    0     0
#>   mg/dm2/d         0    0    0    0   10    0     0    0     0
#>   mg/kg            0    0    0    0    0    0     0  126     0
#>   mg/L             0    0    0    0    0    0    28    2     0
#>   mg/m3            0    0    0    0    0    0     0    0     1
#>   mm               0    0    8    0    0    0     0    0     0
#>   ng/L             0    0    0    0    1    0     0    0     0
#>   ug/cm2           0    5    0    0    2    0     0    0     0
#>   ug/L             0    0    0    0 1727    0     0    0     0

# wrapper function to test two ways of converting
convert <- function(value, from, to, fun) {
  stopifnot(length(value) == length(from) && length(from) == length(to))
  
  vapply(seq_along(value), 
         function(i) {
           tryCatch(fun(value[i], from[i], to[i]), 
                    error = function(e) NA_real_)
         }, FUN.VALUE = double(1))
}

convert_units <- function(value, from, to) {
  ret <- units::set_units(units::set_units(value, from, mode = "standard"), to, mode = "standard")
  as.numeric(ret)
}

# using set_units
system.time(
  test1 <- convert(x$val, x$from_unit, x$to_unit, convert_units)
)
#>    user  system elapsed 
#>   7.230   0.663   8.725

# using ud_convert
system.time(
  test2 <- convert(x$val, x$from_unit, x$to_unit, units:::ud_convert)
)
#>    user  system elapsed 
#>   0.192   0.008   0.211

all.equal(test1, test2)
#> [1] TRUE

Created on 2021-02-16 by the reprex package (v1.0.0)

I was also experimenting with mixed_units(), but while I could set the mixed units, I didn't see an obvious way to convert them. But it is entirely possible I'm missing something.

@ateucher
Copy link
Author

Hmmm, this is possibly a solution:

library(dplyr)

convert_units2 <- function(value, from, to) {
  stopifnot(length(unique(from)) ==1)
  stopifnot(length(unique(to)) ==1)
  
  ret <- tryCatch(
    units::set_units(units::set_units(value, from, mode = "standard"), to, mode = "standard"), 
    error = function(e) NA_real_
  )
  as.numeric(ret)
}

system.time(
  test3 <- x %>% 
    group_by(from_unit, to_unit) %>% 
    mutate(new_val = convert_units2(val, from_unit[1], to_unit[1]))
)
#>    user  system elapsed 
#>   0.058   0.002   0.061

all.equal(test2, test3$new_val)
#> [1] TRUE

Created on 2021-02-16 by the reprex package (v1.0.0)

@Enchufa2
Copy link
Member

set_units provides a method for mixed_units, so you don't need to loop over units:

from <- c("m/s", "km/h", "mg/L", "g")
to <- c("km/h", "m/s", "g/L", "kg")
x <- mixed_units(1:4, from)
set_units(x, to, mode="standard")
#> Mixed units: g/L (1), kg (1), km/h (1), m/s (1) 
#> 3.6 [km/h], 0.5555556 [m/s], 0.003 [g/L], 0.004 [kg]

@Enchufa2
Copy link
Member

For reference, this would be the way to do it with mixed_units:

x <- read.csv(csv_file, stringsAsFactors = FALSE)
x <- x[mapply(ud_are_convertible, x$to_unit, x$from_unit), ]
x$val <- with(x, mixed_units(val, from_unit))
x$new_val <- with(x, set_units(val, to_unit, mode="standard"))

I just dropped non-convertible units. Yours is a good approach too, faster in this case because many units are repeated. Thus, we could probably take this idea of grouping into the internals of set_units.mixed_units to improve its performance.

Anyway, in answer to the original request, I don't think that exporting ud_convert is a good idea, because it entirely defeats the purpose of the package.

@ateucher
Copy link
Author

This is brilliant, thanks @Enchufa2. I had missed the mixed_units method for set_units - and I do recognize that this request was outside the intent of the package. I think I'll go with the grouped approach.

ud_are_convertible is also not exported... would you consider exporting that, or a variant that fits with the package philosophy?

Thanks for the quick response

@Enchufa2
Copy link
Member

I took a look at the thread referenced above and I do think your grouped approach is the best for your use case.

ud_are_convertible is exported in the devel version.

@ateucher
Copy link
Author

I think so too. Thanks so much for the thorough responses.

@ateucher
Copy link
Author

@Enchufa2 sorry to pester you again on this. Do you have an idea of when the next version (with ud_are_convertible exported) will be heading to CRAN?

@edzer
Copy link
Member

edzer commented Feb 18, 2021

(ready when you are, @Enchufa2 )

@Enchufa2
Copy link
Member

I would like to give another pass to the list of issues and address some of them before going for another release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants