Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

which=NA inconsistent with ?data.table #4411

Closed
MichaelChirico opened this issue May 2, 2020 · 3 comments · Fixed by #4430
Closed

which=NA inconsistent with ?data.table #4411

MichaelChirico opened this issue May 2, 2020 · 3 comments · Fixed by #4430
Assignees
Milestone

Comments

@MichaelChirico
Copy link
Member

Was answering this question:

https://stackoverflow.com/q/61554066/3576984

and trying to mirror base behavior with which, i.e., for which to return NA elements of i as NA:

DT = data.table(
  A = c(NA, 3, 5, 0, 1, 2),
  B = c("foo", "foo", "foo", "bar", "bar", "bar")
)
DT[A  > 1, which=TRUE]
# [1] 2 3 6

whereas

DT[ , .I[A > 1]]
# [1] NA  2  3  6

We can use e.g. na.omit to mirror the data.table behavior like .I[na.omit(A > 1)] but doing the reverse seems awkward:

idx = DT[is.na(A) | A > 1, which = TRUE]
is.na(idx) <- is.na(DT[idx]$A)
idx
# [1] NA  2  3  6

I thought this in ?data.table might be helpful:

which

TRUE returns the row numbers of x that i matches to. If NA, returns the row numbers of i that have no match in x. By default FALSE and the rows in x that match are returned.

Sort of reads like DT[A > 1, which = NA] should return 1L, but it errors instead:

DT[A > 1, which = NA]

Error in if (which) return(if (is.null(irows)) seq_len(nrow(x)) else irows) :

missing value where TRUE/FALSE needed

Should we change the documentation, or is this a bug?

@MichaelChirico
Copy link
Member Author

Just found #4303 again, will check if #4342 closes this as well, I'm not sure at a glance if it's a duplicate

@MichaelChirico
Copy link
Member Author

#4342 does not close this issue as of f9e21dd

@jangorecki
Copy link
Member

jangorecki commented May 4, 2020

I think we shouldn't be handling that specially.

DT[A  > 1, which=TRUE]

is subsetting a data.table, whereas

DT[ , .I[A > 1]]

is subsetting an integer vector.
and this difference for subsetting data.table is covered in FAQ.

And the error you got from which=NA seems to be a bug...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants