Skip to content

Commit

Permalink
Closes #891. Subset handles duplicate cols consistently.
Browse files Browse the repository at this point in the history
  • Loading branch information
arunsrinivasan committed Oct 15, 2014
1 parent 54071d6 commit 8701d5a
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 2 deletions.
4 changes: 2 additions & 2 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -1944,8 +1944,8 @@ subset.data.table <- function (x, subset, select, ...)
nl <- as.list(seq_len(ncol(x)))
setattr(nl,"names",names(x))
vars <- eval(substitute(select), nl, parent.frame()) # e.g. select=colF:colP
if (is.numeric(vars)) vars=names(x)[vars]
key.cols <- intersect(key.cols, vars) ## Only keep key.columns found in the select clause
# #891 fix - don't convert numeric vars to column names - will break when there are duplicate columns
key.cols <- intersect(key.cols, names(x)[vars]) ## Only keep key.columns found in the select clause
}

ans <- x[r, vars, with = FALSE]
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@

13. `DT[, LHS := RHS]` with RHS is of the form `eval(parse(text = foo[1]))` referring to columns in `DT` is now handled properly. Closes [#880](https://github.com/Rdatatable/data.table/issues/880). Thanks to tyner.

14. `subset` handles extracting duplicate columns in consistency with data.table's rule - if a column name is duplicated, then accessing that column using column number should return that column, whereas accessing by column name (due to ambiguity) will always extract the first column. Closes [#891](https://github.com/Rdatatable/data.table/issues/891). Thanks to @jjzz.

#### NOTES

1. Clearer explanation of what `duplicated()` does (borrowed from base). Thanks to @matthieugomez for pointing out. Closes [#872](https://github.com/Rdatatable/data.table/issues/872).
Expand Down
6 changes: 6 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -5397,6 +5397,12 @@ DT2 = data.table(start=tt[2], end=tt[2])
setkey(DT2)
test(1390.5, foverlaps(DT1, DT2, which=TRUE), data.table(xid=1:3, yid=as.integer(c(NA, 1, NA))))

# Fix for #891. 'subset' and duplicate names.
# duplicate column names rule - if column numbers, extract the right column. If names, extract always the first column
DT = data.table(V1=1:5, V2=6:10, V3=11:15)
setnames(DT, c("V1", "V2", "V1"))
test(1391.1, subset(DT, select=c(3L,2L)), DT[, c(3L, 2L), with=FALSE])
test(1391.2, subset(DT, select=c("V2", "V1")), DT[, c("V2", "V1"), with=FALSE])

##########################

Expand Down

0 comments on commit 8701d5a

Please sign in to comment.