Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge (outer join) fails to set NAs for integer64 #1385

Closed
dlithio opened this issue Oct 8, 2015 · 2 comments
Closed

Merge (outer join) fails to set NAs for integer64 #1385

dlithio opened this issue Oct 8, 2015 · 2 comments
Assignees
Milestone

Comments

@dlithio
Copy link

dlithio commented Oct 8, 2015

Related to #488.

require(data.table)
require(bit64)
dt1 <- data.table(x = c(1),y = integer64(1))
dt2 <- data.table(x = c(1,2))
setkey(dt1,x)
setkey(dt2,x)
merge(dt1,dt2,all=TRUE)
#   x                   y
#1: 1                   0
#2: 2 9218868437227407266

can reach the desired result by the workaround

dt <- merge(dt1,dt2,all=TRUE)
dt[as.character(y)== "9218868437227407266", y := as.integer64(NA)]
dt
#   x  y
#1: 1  0
#2: 2 NA

there is a note about what may be causing this behavior in the bit64 reference

Subscripting non-existing elements and subscripting with NAs is currently not supported. Such subscripting currently returns 9218868437227407266 instead of NA (the NA value of the underlying double code). Following the full R behaviour here would either destroy performance or require extensive C-coding.

@arunsrinivasan
Copy link
Member

I'm not sure what we can do about this.. But good to know, thanks.

Matt, if you think we should fix this, feel free to reopen.

@dlithio
Copy link
Author

dlithio commented Oct 12, 2015

Thanks for the response. I don't have strong feelings either way, but I wonder if it would be appropriate to provide a warning of some sort either in fread or in data.table's merge. Just to be clear, the reason for this would be that using base R to read files and merge will give expected results (at least in terms of what is NA)

df1 = read.table(header = TRUE, text = "
                 x y
                 1 30000000001
                 ")

df2 = read.table(header = TRUE, text = "
                 x
                 1
                 2
                 ")

merge(df1,df2,all=TRUE)

#  x     y
#1 1 3e+10
#2 2    NA

While reading the same two files with fread raises the issue I described above

require(data.table)

dt1 = fread("x y
             1 30000000001")

#Warning message:
#  In fread("x y\n             1 30000000001") :
#  Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.

require(bit64)

dt2 = fread("x
                 1
                 2")

setkey(dt1,x)
setkey(dt2,x)

merge(dt1,dt2,all=TRUE)

#x                   y
#1: 1         30000000001
#2: 2 9218868437227407266

Even if you do require(bit64) after the merge the same issue still persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants