Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird behaviour using .SD, by= and copy #958

Closed
jhrmnn opened this issue Nov 17, 2014 · 3 comments
Closed

Weird behaviour using .SD, by= and copy #958

jhrmnn opened this issue Nov 17, 2014 · 3 comments
Assignees
Labels
Milestone

Comments

@jhrmnn
Copy link

jhrmnn commented Nov 17, 2014

I've stumbled upon a really weird behaviour with 1.9.4. The minimal example is

library(data.table)
dt <- data.table(a=c(1, 1, 1, 0, 0),
                 b=c("A", "B", "A1", "A", "B"))
dt
##    a  b
## 1: 1  A
## 2: 1  B
## 3: 1 A1
## 4: 0  A
## 5: 0  B
dt[, nrow(.SD[b == 'B']), by=.(a)]
##    a V1
## 1: 1  1
## 2: 0  0
dt[, nrow(copy(.SD)[b == 'B']), by=.(a)]
##    a V1
## 1: 1  1
## 2: 0  1
dt[3, b:="C"]
##    a b
## 1: 1 A
## 2: 1 B
## 3: 1 C
## 4: 0 A
## 5: 0 B
dt[, nrow(.SD[b == 'B']), by=.(a)]
##    a V1
## 1: 1  1
## 2: 0  1
dt[, nrow(copy(.SD)[b == 'B']), by=.(a)]
##    a V1
## 1: 1  1
## 2: 0  1

Note that when dt[3, b] == "A1", the output with and without copy differs, which I don't understand at all. When dt[3, b] == "C", the behaviour is as expected.

@arunsrinivasan
Copy link
Member

Thanks, I've reproduced the bug on 1.9.5 as well.

For the future, please read and follow the instructions here and here.

@arunsrinivasan arunsrinivasan added this to the v1.9.6 milestone Nov 17, 2014
@arunsrinivasan
Copy link
Member

The issue is due to automatic indexing creating secondary key attribute on .SD.

require(data.table)
options(datatable.auto.index = FALSE)
dt <- data.table(a=c(1, 1, 1, 0, 0),
                 b=c("A", "B", "A1", "A", "B"))
dt[, nrow(.SD[b == 'B']), by=.(a)]
#    a V1
# 1: 1  1
# 2: 0  1

@jangorecki
Copy link
Member

the issue was solved by following line:
is.null(attr(x, '.data.table.locked'))) { # fix for #958, don't create auto index on '.SD'.
isn't it unsafe to use attr(..., exact=FALSE)?
I know it is unlikely but such error caused by some approximation of names can be hard to debug if the errors occurs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants