Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gprod 64bit produces wrong results #5225

Closed
ben-schwen opened this issue Oct 16, 2021 · 0 comments · Fixed by #5231
Closed

gprod 64bit produces wrong results #5225

ben-schwen opened this issue Oct 16, 2021 · 0 comments · Fixed by #5231
Assignees
Labels
bit64 GForce issues relating to optimized grouping calculations (GForce)
Milestone

Comments

@ben-schwen
Copy link
Member

ben-schwen commented Oct 16, 2021

gprod produces wrong results for large integer64 and for prod(na.rm=TRUE)

library(bit64)
DT = data.table(x=c(lim.integer64(), 1, 1), g=1:2)
DT
#>                       x g
#> 1: -9223372036854775807 1
#> 2:  9223372036854775807 2
#> 3:                    1 1
#> 4:                    1 2
DT[, prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.000s elapsed (0.000s cpu) 
#> lapply optimization is on, j unchanged as 'prod(x)'
#> GForce optimized j to 'gprod(x)'
#> Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
#> gforce assign high and low took 0.001
#> gforce eval took 0.000
#> 0.000s elapsed (0.001s cpu)
#>    g                  V1
#> 1: 1                <NA>
#> 2: 2 9221120237041092514
DT[, base::prod(x), g, verbose=TRUE]
#> Argument 'by' after substitute: g
#> Detected that j uses these columns: [x]
#> Finding groups using forderv ... forder.c received 4 rows and 1 columns
#> 0.000s elapsed (0.000s cpu) 
#> Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu) 
#> Getting back original order ... forder.c received a vector type 'integer' length 2
#> 0.001s elapsed (0.000s cpu) 
#> lapply optimization is on, j unchanged as 'base::prod(x)'
#> GForce is on, left j unchanged
#> Old mean optimization is on, left j unchanged.
#> Making each group and running j (GForce FALSE) ... 
#>   collecting discontiguous groups took 0.000s for 2 groups
#>   eval(j) took 0.000s for 2 calls
#> 0.000s elapsed (0.000s cpu)
#>    g                   V1
#> 1: 1 -9223372036854775807
#> 2: 2  9223372036854775807

edit1:
It also does not handle na.rm=TRUE correctly for integer64.

library(bit64)
DT = data.table(x=as.integer64(c(1:2, NA, NA)), g=1:2)
DT
#>       x g
#> 1:    1 1
#> 2:    2 2
#> 3: <NA> 1
#> 4: <NA> 2
DT[, prod(x, na.rm=TRUE), g]
#>    g   V1
#> 1: 1 <NA>
#> 2: 2 <NA>
DT[, base::prod(x, na.rm=TRUE), g]
#>    g V1
#> 1: 1  1
#> 2: 2  2

edit2:
Ok it doesn't even have to be "large" integer64

DT = data.table(x=as.integer64(c(2, -2, 1, 1)), g=1:2)
DT
#>     x g
#> 1:  2 1
#> 2: -2 2
#> 3:  1 1
#> 4:  1 2
DT[, prod(x), g]
#>    g V1
#> 1: 1  0
#> 2: 2 -2
DT[, base::prod(x), g]
#>    g V1
#> 1: 1  2
#> 2: 2 -2
sessionInfo()
sessionInfo()
#> R version 4.1.1 (2021-08-10)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=de_AT.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=de_AT.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=de_AT.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=de_AT.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] bit64_4.0.5       bit_4.0.4         data.table_1.14.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] knitr_1.34      magrittr_2.0.1  rlang_0.4.11    fastmap_1.1.0  
#>  [5] fansi_0.5.0     stringr_1.4.0   styler_1.6.1    highr_0.9      
#>  [9] tools_4.1.1     xfun_0.26       utf8_1.2.2      withr_2.4.2    
#> [13] htmltools_0.5.2 ellipsis_0.3.2  yaml_2.2.1      digest_0.6.27  
#> [17] tibble_3.1.4    lifecycle_1.0.0 crayon_1.4.1    purrr_0.3.4    
#> [21] vctrs_0.3.8     fs_1.5.0        glue_1.4.2      evaluate_0.14  
#> [25] rmarkdown_2.11  reprex_2.0.1    stringi_1.7.4   compiler_4.1.1 
#> [29] pillar_1.6.2    backports_1.2.1 pkgconfig_2.0.3
@ben-schwen ben-schwen added bit64 GForce issues relating to optimized grouping calculations (GForce) labels Oct 16, 2021
@ben-schwen ben-schwen self-assigned this Oct 21, 2021
@mattdowle mattdowle added this to the 1.14.3 milestone Oct 25, 2021
@jangorecki jangorecki modified the milestones: 1.14.9, 1.15.0 Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bit64 GForce issues relating to optimized grouping calculations (GForce)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants