Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math and ops functions, bioRad 0.5.2.9504 #460

Merged
merged 24 commits into from
Apr 21, 2022
Merged

math and ops functions, bioRad 0.5.2.9504 #460

merged 24 commits into from
Apr 21, 2022

Conversation

bart1
Copy link
Collaborator

@bart1 bart1 commented Jun 11, 2021

@adokter @barthoekstra @CeciliaNilsson709 @peterdesmet
I have started looking into how we could more easily work with the bioRad objects, among others to address #404 . In this pull request I made a first draft how this could work. It should make it possible to more easily work with multiple pvols/scans. This could be an example operation (e.g. to accumulate pvols for generating masks):

pvolList<-list(example_pvol, example_pvol)
log(Reduce('+', lapply(pvolList, exp)))

In this way you could combine a list of pvols. This request is certainly still a draft but I think it would be good do discuss some possible design choices. I have not added the test for the errors but some example calculations can be seen in the test file.

Some discussion points:

  • How to handle functions like cumsum. Currently I just warn that they are not very logical. We could accumulate along the rays or just error?
  • When combining pvols with different dimensions currently errors would occur but these are from lower level functions. Would we want custom error messages and if so for what?
    • Different param dimensions? Differences in range gate size?
    • Different number of scans? How about elevation angles, should they match?
    • As these functions are power function I think we might want to watch out for restricting things too much (maybe people want to do math between radars, although I do not see why yet)
  • I do not change the attributes of the object or names of quantities, this seems hard to have one logical answer. I now just use the first one. Maybe we want to strip some?
  • I think this functionality should be paired with an option to select quantities in a pvol (e.g. retain only DBZH and DBZV). This function should return a pvol and thus differs from get_pvol

@bart1 bart1 marked this pull request as draft June 11, 2021 10:00
@codecov-commenter
Copy link

codecov-commenter commented Jun 11, 2021

Codecov Report

Merging #460 (cdb9648) into master (2d10a97) will increase coverage by 1.89%.
The diff coverage is 100.00%.

❗ Current head cdb9648 differs from pull request most recent head bbfeba6. Consider uploading reports for the commit bbfeba6 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #460      +/-   ##
==========================================
+ Coverage   55.73%   57.62%   +1.89%     
==========================================
  Files          55       57       +2     
  Lines        3140     3226      +86     
==========================================
+ Hits         1750     1859     +109     
+ Misses       1390     1367      -23     
Impacted Files Coverage Δ
R/hooks.R 43.47% <ø> (ø)
R/operators.R 100.00% <100.00%> (ø)
R/select.R 100.00% <100.00%> (ø)
R/calculate_param.R 62.16% <0.00%> (+62.16%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2d10a97...bbfeba6. Read the comment docs.

@bart1
Copy link
Collaborator Author

bart1 commented Jun 11, 2021

With a bit more thought we can maybe also do functions like

  • max/min
  • prod/sum
  • any/all
  • mean/weighted.mean

@barthoekstra
Copy link
Collaborator

Ok, I like this a lot. Way more elegant than the convoluted methods I've used before.

How to handle functions like cumsum. Currently I just warn that they are not very logical. We could accumulate along the rays or just error?

I'm not sure of the use-cases, but instead of accumulating along a ray, I'd then make it accumulate along a dimension that can be user-defined

As these functions are power function I think we might want to watch out for restricting things too much (maybe people want to do math between radars, although I do not see why yet)

I agree, though it may be nice if there's some functionality that (optionally) checks the similarity of pvol and scan objects before causing these operations to crash midway because of mismatching dimensions, etc. If found to be dissimilar, you could skip to the next object.

Is there some way to expand this to ppi objects? I've used stacked versions of those as well.

@bart1
Copy link
Collaborator Author

bart1 commented Jun 11, 2021

Thanks @barthoekstra .

I'm not sure of the use-cases, but instead of accumulating along a ray, I'd then make it accumulate along a dimension that can be user-defined

What would the dimensions be your thinking about? rays, directions and across polar volumes (e.g. time). I'm not sure if that fits within what I would expect from base functions like cumsum. Maybe we could think of an apply function in the future to cover these kind of cases?

I agree, though it may be nice if there's some functionality that (optionally) checks the similarity of pvol and scan objects before causing these operations to crash midway because of mismatching dimensions, etc. If found to be dissimilar, you could skip to the next object.

I think we cant add arguments for more or less checking. I'm not against leaving the responsibility with the user and only see if the dimensions make sense. If people would like to be strict they could use something like all.equal(attributes_table(pvol1),attributes_table(pvol2)) but i think that would be overly restrictive for the general case

Is there some way to expand this to ppi objects? I've used stacked versions of those as well.

On the expansion to ppi objects. That is not too difficult, I however am not sure how much functionality we would like to implement on ppi objects in they end they are SpatialGridDataFrames and can easily be converted rasters or stars objects where all this functionality exists (and more). For me that is a bit of a question where the balance of maintenance is.

@bart1
Copy link
Collaborator Author

bart1 commented Jun 11, 2021

Quick example that logical operations also work (off cause if you have more different pvols you can also identify locations where a threshold is frequently passed)

plot(project_as_ppi(get_scan(example_pvol>1,3)), zlim=0:1)

@adokter
Copy link
Owner

adokter commented Jun 11, 2021

This is a nice idea. I wonder about the use case (is it common enough?) and if some of this functionality deserves a separate function:

  1. It seems quite uncommon that you want to apply the same operation on all quantities in a pvol or a scan, because they are so different. It makes most sense at the param level, but because these are matrices the arithmetic was already available.
  2. a use-case would be averaging time-series of pvol's and scans. You would want to average linear and logarithmic quantities differently, and include checks that the dimensions and elevations of the scans and pvols line up. Perhaps a dedicated function for this might signal this functionality more clearly to the user, and be more flexible to develop. See add time averaging function #77
  3. I do see a use case for the logical operators, so you can quickly inspect values, for example visualize where is DBZH between 1 and 3, e.g. plot(example_scan > 1 & example_scan < 3, zlim=0:1), quite elegant - but to new users it might be confusing that DBZH of the resulting object is now something different (and the plot functions won't recognize the change)
  4. For the sum operator, I could also see an alternative implementation: scan + scan -> pvol, and param + param -> scan, so more on the level of organizing the objects. But also here a separate function might signal such functionality more clearly to new users, see binding scans into a pvol and binding params into a scan #422

Ok, I like this a lot. Way more elegant than the convoluted methods I've used before.

  1. @barthoekstra could you explain your use case in more detail?
  2. any other use cases?

@adokter
Copy link
Owner

adokter commented Jun 11, 2021

Here a few things I noticed when playing around:

This operation throws a complicated warning:

> example_scan_subset = example_scan
> example_scan_subset$params=example_scan$params[3:5]
> example_scan_sum <- example_scan_subset+example_scan
Warning messages:
1: In names(e1$params) != names(e2$params) :
  longer object length is not a multiple of shorter object length
2: In Ops.scan(example_scan_subset, example_scan) :
  The names of parameters do not match, this
                       likely means you try to combine (+) different quantities.
3: In mapply(.Generic, e1$params, e2$params, SIMPLIFY = F) :
  longer argument not a multiple of length of shorter

names of the param list is broken:

> example_scan_sum$params
$RHOHV
               Polar scan parameter (class param)

    quantity:  RHOHV 
        dims:  480 bins x 360 rays

$ZDR
               Polar scan parameter (class param)

    quantity:  ZDR 
        dims:  480 bins x 360 rays

$PHIDP
               Polar scan parameter (class param)

    quantity:  PHIDP 
        dims:  480 bins x 360 rays

$<NA>
               Polar scan parameter (class param)

    quantity:  RHOHV 
        dims:  480 bins x 360 rays

$<NA>
               Polar scan parameter (class param)

    quantity:  ZDR 
        dims:  480 bins x 360 rays

PHIDP value changed, but it's not the original value * 2:

> example_scan_sum$params$PHIDP[1,1]
[1] -19.77208
> example_scan$params$PHIDP[1,1]
[1] -20.47058

@bart1
Copy link
Collaborator Author

bart1 commented Jun 11, 2021

The latest push now throws errors for unequal length data

@bart1
Copy link
Collaborator Author

bart1 commented Jun 11, 2021

This is a nice idea. I wonder about the use case (is it common enough?) and if some of this functionality deserves a separate function:

1. It seems quite uncommon that you want to apply the same operation on all quantities in a pvol or a scan, because they are so different. It makes most sense at the param level, but because these are matrices the arithmetic was already available.

As said I think it should be paired with a function to reduce pvols to a limited set of parameters. I will make such a function but would welcome suggestions for a name.

2. a use-case would be averaging time-series of pvol's and scans. You would want to average linear and logarithmic quantities differently, and include checks that the dimensions and elevations of the scans and pvols line up. Perhaps a dedicated function for this might signal this functionality more clearly to the user, and be more flexible to develop. See #77

I see these functions maybe more as a set of tools to develop functionality as described above and generating the flexibility to experiment with it more options. It is not a final solution for all users. I think also when implementing these averaging function this functionality might come in handy and make those kind of functions easier to write.

3. I do see a use case for the logical operators, so you can quickly inspect values, for example visualize where is DBZH between 1 and 3, e.g. `plot(example_scan > 1 & example_scan < 3, zlim=0:1)`, quite elegant - but to new users it might be confusing that DBZH of the resulting object is now something different (and the plot functions won't recognize the change)

4. For the sum operator, I could also see an alternative implementation: scan + scan -> pvol, and param + param -> scan, so more on the level of organizing the objects. But also here a separate function might signal such functionality more clearly to new users, see #422

I think using the + symbol for the operations you suggest might be somewhat confusing (only ggplot uses it that way). Going by the terminology used in most other packages I would think words like stack or bind might work better. I do however think that a function like bind_params and bind_scans would be useful to generate scans and pvols respectively

@bart1
Copy link
Collaborator Author

bart1 commented Jun 21, 2021

I have now added a select function to select specific parameters from a pvol or scan. I opted to extent the select from dplyr as it seems a natural verb for this purpose. This also means flexible selection strategies are possible (see some of the example below). This can come in useful when you want to select a specific parameter both horizontal and vertical, or only every thing relating to vertical data.

The other change I made it to allow flexible multiplication for length one vectors as it can come in useful when applying masks.

devtools::load_all("~/bioRad")
#> ℹ Loading bioRad
#> Welcome to bioRad version 0.5.2.9419
#> Docker daemon running, Docker functionality enabled (vol2bird version 0.5.0)
require(uvaRadar)
#> Loading required package: uvaRadar
require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:bioRad':
#> 
#>     between, collapse, filter, first, lag, last
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
pvol<-get_pvol("NL/HRW", param='all')
scan<-get_scan(pvol, 1)
scan
#>                   Polar scan (class scan)
#> 
#>      parameters:  DBZH CPAH CCORH SQIH CCORV CPAV SQIV UPHIDP VRADV WRADV TH DBZV TV VRADH WRADH RHOHV KDP PHIDP 
#> elevation angle:  0.8 deg
#>            dims:  832 bins x 360 rays
select(scan, 'RHOHV')
#>                   Polar scan (class scan)
#> 
#>      parameters:  RHOHV 
#> elevation angle:  0.8 deg
#>            dims:  832 bins x 360 rays
scan%>% select(starts_with('DBZ'))
#>                   Polar scan (class scan)
#> 
#>      parameters:  DBZH DBZV 
#> elevation angle:  0.8 deg
#>            dims:  832 bins x 360 rays
scan%>% select(contains('RAD'))
#>                   Polar scan (class scan)
#> 
#>      parameters:  VRADV WRADV VRADH WRADH 
#> elevation angle:  0.8 deg
#>            dims:  832 bins x 360 rays
scan%>% select(ends_with('H'))
#>                   Polar scan (class scan)
#> 
#>      parameters:  DBZH CPAH CCORH SQIH TH VRADH WRADH 
#> elevation angle:  0.8 deg
#>            dims:  832 bins x 360 rays

Created on 2021-06-21 by the reprex package (v2.0.0)

@peterdesmet
Copy link
Collaborator

I like the approach of reusing dplyr verbs (select(), filter()) on bioRad objects. I understand the use case for those more than using operators on bioRad objects.

@bart1
Copy link
Collaborator Author

bart1 commented Jun 26, 2021

I like the approach of reusing dplyr verbs (select(), filter()) on bioRad objects. I understand the use case for those more than using operators on bioRad objects.

With regard to filter what would you expect the result to be? Normally filter omits records from a dataset. As the pvol format require square matrices it is not really possible to omit records. You could set records to NA? But that is kind of a different operation. Alternatively you can filter scans in an pvol (e.g. elevation angle below a certain value). I have not implemented the other verbs as I was not clear what the expected behavior would be

@peterdesmet
Copy link
Collaborator

@bart1 I didn't really think it through, I just like the idea of using dplyr verbs on bioRad objects, but we should only have it where it makes sense.

@adokter
Copy link
Owner

adokter commented Jun 28, 2021

(some of the) time selection by filter_vpts() would work nicely as a dplyr verb

@adokter
Copy link
Owner

adokter commented Jun 28, 2021

Hi @bart1, some more feedback:

* How to handle functions like `cumsum`. Currently I just warn that they are not very logical. We could accumulate along the rays or just error?

cumsum and similar operations that break the scan and parameter class structures should be forbidden, for package consistency a scan should always contain scan parameter objects:

> class(get_param(example_scan,"DBZH"))
[1] "param"  "matrix" "array" 
> class(get_param(example_scan+example_scan,"DBZH"))
[1] "param"  "matrix" "array" 
> class(get_param(cumsum(example_scan),"DBZH"))
[1] "numeric"
Warning message:
In Math.scan(example_scan) :
  Operation cumsum not meaningful for scan objects
* When combining pvols with different dimensions currently errors would occur but these are from lower level functions. Would we want custom error messages and if so for what?
  * Different `param` dimensions? Differences in range gate size?
  * Different number of `scans`? How about elevation angles, should they match?
  * As these functions are power function I think we might want to watch out for restricting things too much (maybe people want to do math between radars, although I do not see why yet)
  • Informative warning messages would be very helpful
  • I would not allow math operations on scans with different range gate and azimuth bin sizes. Currently an output is generated for example_scan1 + example_scan2, but the output is garbage when the two have different dimensions.
* I do not change the attributes of the object or names of quantities, this seems hard to have one logical answer. I now just use the first one. Maybe we want to strip some?

Could you add an attributes_from argument just as in bind_into_vpts() so users can select which attributes they want to keep?

* I think this functionality should be paired with an option to select quantities in a pvol (e.g. retain only DBZH and DBZV). This function should return a pvol and thus differs from `get_pvol`

Really like the dplyr select you added!

@bart1
Copy link
Collaborator Author

bart1 commented Jul 12, 2021

Hi @bart1, some more feedback:

cumsum and similar operations that break the scan and parameter class structures should be forbidden, for package consistency a scan should always contain scan parameter objects:

> class(get_param(example_scan,"DBZH"))
[1] "param"  "matrix" "array" 
> class(get_param(example_scan+example_scan,"DBZH"))
[1] "param"  "matrix" "array" 
> class(get_param(cumsum(example_scan),"DBZH"))
[1] "numeric"
Warning message:
In Math.scan(example_scan) :
  Operation cumsum not meaningful for scan objects

This now results in an error

* Informative warning messages would be very helpful

* I would not allow math operations on scans with different range gate and azimuth bin sizes. Currently an output is generated for `example_scan1 + example_scan2`, but the output is garbage when the two have different dimensions.

I have added a bunch of warnings to this extent. I did not want to error since these warning can also occur when attributes are omitted or changed by the data provider (e.g. the den helder radar moved because coordinates were updated.

* I do not change the attributes of the object or names of quantities, this seems hard to have one logical answer. I now just use the first one. Maybe we want to strip some?

Could you add an attributes_from argument just as in bind_into_vpts() so users can select which attributes they want to keep?

You can't really add arguments to the functions, but i retains attributes from the first object (now documented).

Really like the dplyr select you added!

Thanks!

I have also added some tests for multiplying with vectors (along rays) or matrixes. This might for example be useful when implementing noise filters. Pvols can now be multiplied with lists then one list element is taken per scan.

bart added 4 commits July 12, 2021 15:29
Merge remote-tracking branch 'origin/master' into math_new
@adokter adokter self-requested a review July 16, 2021 19:52
@adokter
Copy link
Owner

adokter commented Jul 16, 2021

Hi @bart1, thanks for adding the tests, which is also helpful for understanding the operators. Here a few more thoughts and questions:

  1. The + operator should always be commutative, and this is not always the case. For example in your tests, s6+s2 != s2+s6, and s2+s7 != s7+s2. I think we should keep the commutative property, since people are very used to it. It might be mostly a naming issue (not the actual values), therefore see also 4). Here a minimal example:
 dplyr::select(example_scan,"VRADH") + dplyr::select(example_scan,"DBZH")
                  Polar scan (class scan)

     parameters:  VRADH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays
           
dplyr::select(example_scan,"DBZH") + dplyr::select(example_scan,"VRADH")
                  Polar scan (class scan)

     parameters:  DBZH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

  1. you do allow several operations for scans with 1 parameter added to scans with multiple parameters. For example in the tests s2 (one param) can be combined with s6 (two params). But combining a scan with two params and a scan with three params is not allowed. My feeling is this is quite unintuitive, and I would remove this special treatment of single parameter scans, and not allow any combination of scans that have different parameters.

  2. Related to the previous, several operations have fairly unintuitive results, e.g. below equation is quite unexpected. For example below, addition of a parameter DB leads to an unexpected multiplication of DF by a factor of two. One would expect DB to remain unchanged if only DF is added.

> s6
                  Polar scan (class scan)

     parameters:  DB DF 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays
> s2
                  Polar scan (class scan)

     parameters:  DB 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

testthat::expect_equal(s6 + s2, s6 + s6) > evaluates to true

here you also see the operations is not commutative:
testthat::expect_equal(s2 + s6, s6 + s6) > evaluates to false

It seems a solution to 2) and 3) is to not allow operations on scans with different sets of quantities

  1. The operations to a certain extent break the object, e.g. below object does not contain a scan parameter with reflectivity DBZH, but it contains a logical field indicating where DBZH was above 1. One way to keep track of this is to rename the parameter "DBZH>1" instead of keeping it as "DBZH". That would also allow for a mechanism for the plot functions to respond to the change in data type. Feasible?
> dplyr::select(example_scan,"DBZH")>1
                  Polar scan (class scan)

     parameters:  DBZH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

@bart1
Copy link
Collaborator Author

bart1 commented Jul 16, 2021

Hi @bart1, thanks for adding the tests, which is also helpful for understanding the operators.

1. The `+` operator should always be commutative, and this is not always the case. For example in your tests, s6+s2 != s2+s6, and s2+s7 != s7+s2. Or another example:
 dplyr::select(example_scan,"VRADH") + dplyr::select(example_scan,"DBZH")
                  Polar scan (class scan)

     parameters:  VRADH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays
           
dplyr::select(example_scan,"DBZH") + dplyr::select(example_scan,"VRADH")
                  Polar scan (class scan)

     parameters:  DBZH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

This has to do with the fact that the first object takes priority in copying of attributes. So the values are exactly the same only the name of the scan is taken from the first object. So in this case only the names of the param differ otherwise everything is exactly the same:

require(bioRad)
#> Loading required package: bioRad
#> Welcome to bioRad version 0.5.2.9419
#> Docker daemon running, Docker functionality enabled (vol2bird version 0.5.0)
data("example_scan")
a<-dplyr::select(example_scan,"VRADH") + dplyr::select(example_scan,"DBZH")
b<-dplyr::select(example_scan,"DBZH") + dplyr::select(example_scan,"VRADH")
all.equal(a,b)
#> [1] "Component \"params\": Names: 1 string mismatch"                                           
#> [2] "Component \"params\": Component 1: Attributes: < Component \"param\": 1 string mismatch >"

Created on 2021-07-16 by the reprex package (v2.0.0)

1. you do allow several operations for scans with 1 parameter added to scans with multiple parameters. For example in the tests s2 (one param) can be combined with s6 (two params). But combining a scan with two params and a scan with three params is not allowed. My feeling is this is quite unintuitive, and I would remove this special treatment of single parameter scans, and not allow any combination of scans that have different parameters.

This is by design to as I think you might want to do operations with a mask for example to multiple parameters. Therefore I did not want to ban these one to many or many to one operations.

2. Related to the previous, several operations have fairly unintuitive results, e.g. below equation is quite unexpected. For example below, addition of a parameter DB leads to an unexpected multiplication of DF by a factor of two. One would expect DB to remain unchanged if only DF is added.
> s6
                  Polar scan (class scan)

     parameters:  DB DF 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays
> s2
                  Polar scan (class scan)

     parameters:  DB 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

testthat::expect_equal(s6 + s2, s6 + s6) > evaluates to true

here you also see the operations is not commutative:
testthat::expect_equal(s2 + s6, s6 + s6) > evaluates to false

It seems a solution to 2) and 3) is to not allow operations on scans with different sets of quantities

1. The operations to a certain extent break the object, e.g. below object does not contain a scan parameter with reflectivity DBZH, but it contains a logical field indicating where DBZH was above 1. One way to keep track of this is to rename the parameter "DBZH>1" instead of keeping it as "DBZH". That would also allow for a mechanism for the plot functions to respond to the change in data type. Feasible?
> dplyr::select(example_scan,"DBZH")>1
                  Polar scan (class scan)

     parameters:  DBZH 
elevation angle:  0.5 deg
           dims:  480 bins x 360 rays

This is an interesting point. I'm not sure what is right here as it might also produce some quite tricky names (spaces, special characters usw). If you would change names it would also mean that for operations that keep the quantity valid you would need to rename to get things working again. My though was that if you are a power user and use this kind of functionality it is up to you to keep it sensible. I'm now online would it maybe make sense to quickly call to coordinate this?

tests/testthat/test-operators.R Outdated Show resolved Hide resolved
@bart1
Copy link
Collaborator Author

bart1 commented Jul 16, 2021

Note that the attribute behavior is now fairly similar to the way sum works in R (first takes priority)

a<-1
b<-2
attr(a,'g')<-'g'
attr(b,'g')<-'b'
a+b
#> [1] 3
#> attr(,"g")
#> [1] "g"
b+a
#> [1] 3
#> attr(,"g")
#> [1] "b"
1+a
#> [1] 2
#> attr(,"g")
#> [1] "g"

Created on 2021-07-16 by the reprex package (v2.0.0)

@bart1
Copy link
Collaborator Author

bart1 commented Feb 15, 2022

Quick update to see if the branch is still mergable with master

@adokter adokter marked this pull request as ready for review April 21, 2022 21:13
@adokter adokter changed the title draft math and ops functions draft math and ops functions, bioRad 0.5.2.9504 Apr 21, 2022
@adokter adokter changed the title draft math and ops functions, bioRad 0.5.2.9504 math and ops functions, bioRad 0.5.2.9504 Apr 21, 2022
@adokter adokter merged commit 4e0435e into master Apr 21, 2022
@adokter adokter deleted the math_new branch April 21, 2022 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants