Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect printing of integer64 columns #6224

Closed
renkun-ken opened this issue Jul 6, 2024 · 4 comments · Fixed by #6227
Closed

Incorrect printing of integer64 columns #6224

renkun-ken opened this issue Jul 6, 2024 · 4 comments · Fixed by #6227

Comments

@renkun-ken
Copy link
Member

Here is a minimal reproducible example:

library(data.table)

dt <- data.table(id = 1:10, int64 = bit64::as.integer64(1:10))
fst::write_fst(dt, "dt.fst")

Start a new session without bit64 being loaded.

> library(data.table)

> dt <- fst::read_fst("dt.fst", as.data.table = TRUE)
fstcore package v0.9.18
(OpenMP was not detected, using single threaded mode)

> dt
       id         int64
    <int>         <i64>
 1:     1 4.940656e-324
 2:     2 9.881313e-324
 3:     3 1.482197e-323
 4:     4 1.976263e-323
 5:     5 2.470328e-323
 6:     6 2.964394e-323
 7:     7 3.458460e-323
 8:     8 3.952525e-323
 9:     9 4.446591e-323
10:    10 4.940656e-323

> dt
       id int64
    <int> <i64>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

It looks like fst::read_fst does not load bit64 automatically if there are integer64 columns in the data. I'm not sure if it better handled on data.table side: if the table contains integer64 columns, bit64 should be loaded in print.data.table.

@Anirban166
Copy link
Member

Can verify the bug.

library(fstcore)
library(data.table)

dt <- data.table(id = 1:10, int64 = bit64::as.integer64(1:10))
fst::write_fst(dt, "dt.fst")

New R session:

(dt <- fst::read_fst("dt.fst", as.data.table = TRUE))
fstcore package v0.9.18
(OpenMP was not detected, using single threaded mode)
       id         int64
    <int>         <i64>
 1:     1 4.940656e-324
 2:     2 9.881313e-324
 3:     3 1.482197e-323
 4:     4 1.976263e-323
 5:     5 2.470328e-323
 6:     6 2.964394e-323
 7:     7 3.458460e-323
 8:     8 3.952525e-323
 9:     9 4.446591e-323
10:    10 4.940656e-323
Warning message:
package ‘fstcore’ was built under R version 4.2.3 
(dt[, int64 := bit64::as.integer64(int64)])
       id int64
    <int> <i64>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

At a first glance, it should probably be handled in the utility function for it as print.data.table calls this:

require_bit64_if_needed = function(DT) {

Looks like there has been an attempt to cover this and loading bit64 has been mentioned as a solution in the warning message there, but I don't think it's actually surfacing. Even a forced introduction of library(bit64) outside of the two conditions in that function doesn't seem to help as I just tested with such changes - After restarting the R session, it initially appears to be a function object as opposed to a data.table object:

str(dt)
# or just use dt for the definition along with bytecode + environment
function (x, df, ncp, log = FALSE)
(dt <- fst::read_fst("dt.fst", as.data.table = TRUE))
Loading required package: bit

Attaching package: ‘bit’

The following object is masked from ‘package:base’:

    xor

Attaching package bit64
package:bit64 (c) 2011-2017 Jens Oehlschlaegel
creators: integer64 runif64 seq :
coercion: as.integer64 as.vector as.logical as.integer as.double as.character as.bitstring
logical operator: ! & | xor != == < <= >= >
arithmetic operator: + - * / %/% %% ^
math: sign abs sqrt log log2 log10
math: floor ceiling trunc round
querying: is.integer64 is.vector [is.atomic} [length] format print str
values: is.na is.nan is.finite is.infinite
aggregation: any all min max range sum prod
cumulation: diff cummin cummax cumsum cumprod
access: length<- [ [<- [[ [[<-
combine: c rep cbind rbind as.data.frame
WARNING don't use as subscripts
WARNING semantics differ from integer
for more help type ?bit64

Attaching package: ‘bit64’

The following object is masked from ‘package:utils’:

    hashtab

The following objects are masked from ‘package:base’:

    :, %in%, is.double, match, order, rank

       id         int64
    <int>         <i64>
 1:     1 4.940656e-324
 2:     2 9.881313e-324
 3:     3 1.482197e-323
 4:     4 1.976263e-323
 5:     5 2.470328e-323
 6:     6 2.964394e-323
 7:     7 3.458460e-323
 8:     8 3.952525e-323
 9:     9 4.446591e-323
10:    10 4.940656e-323
Warning message:
package ‘fstcore’ was built under R version 4.2.3 
str(dt)
Classes ‘data.table’ and 'data.frame':	10 obs. of  2 variables:
  $ id   : int  1 2 3 4 5 6 7 8 9 10
$ int64:integer64 1 2 3 4 5 6 7 8 ... 
- attr(*, ".internal.selfref")=<externalptr> 
dt
       id int64
    <int> <i64>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10
sessionInfo()
R version 4.2.1 (2022-06-23)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.2.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] bit64_4.0.5    bit_4.0.5      fstcore_0.9.18

loaded via a namespace (and not attached):
[1] compiler_4.2.1     parallel_4.2.1     tools_4.2.1        rstudioapi_0.14   
[5] Rcpp_1.0.12        data.table_1.15.99 fst_0.9.8

@Anirban166
Copy link
Member

You can use getNamespace and convert any found integer64 entries into base64::as.integer64 though, as a solution outside the scope of changing that function to enforce the loading of bit64 instead of just suggesting it (as it currently operates, given how requireNamespace("bit64", quietly = TRUE) only ensures that the bit64 package is installed and available for use, rather than it being attached as a package to the search path or loaded into the global environment). For e.g., after restarting or launching a new R session, this works:

dt <- fst::read_fst("dt.fst", as.data.table = TRUE)
if(any(sapply(dt, inherits, "integer64"))) 
{
  dt[, names(dt)[sapply(dt, inherits, "integer64")] := lapply(.SD, getNamespace("bit64")$as.integer64), .SDcols = sapply(dt, inherits, "integer64")]
}
dt
       id int64
    <int> <i64>
 1:     1     1
 2:     2     2
 3:     3     3
 4:     4     4
 5:     5     5
 6:     6     6
 7:     7     7
 8:     8     8
 9:     9     9
10:    10    10

@ben-schwen
Copy link
Member

Seems related/duplicated to fstpackage/fst#267

@MichaelChirico
Copy link
Member

This is not strictly related to another package:

library(data.table)
DT=data.table(a=1, b=structure(1.06099789548264e-314, class = "integer64"))
DT
#        a             b
#    <num>         <i64>
# 1:     1 1.060998e-314
loadNamespace("bit64")
DT
#        a          b
#    <num>      <i64>
# 1:     1 2147483648

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants