Skip to content

Commit

Permalink
Closes #604. by=.EACHI is implemented for not joins.
Browse files Browse the repository at this point in the history
X[ !Y, j, by=.EACHI] now works as intended.
  • Loading branch information
arunsrinivasan committed Aug 5, 2014
1 parent 5181e56 commit 569fbf3
Show file tree
Hide file tree
Showing 3 changed files with 37 additions and 9 deletions.
14 changes: 12 additions & 2 deletions R/data.table.R
Original file line number Diff line number Diff line change
Expand Up @@ -457,8 +457,8 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
# Otherwise, types of i join columns are alyways promoted to match x's
# types (with warning or verbose)
i = shallow(i) # careful to only plonk syntax on i from now on (otherwise i would change)
# TO DO: enforce via .internal.shallow attribute and expose shallow() to users
# This is why shallow() is very importantly internal only, currently.
# TO DO: enforce via .internal.shallow attribute and expose shallow() to users
# This is why shallow() is very importantly internal only, currently.
resetifactor = NULL # Keep track of any factor to factor join cols (only time we keep orig)
for (a in seq_along(leftcols)) {
# This loop is simply to support joining factor columns
Expand Down Expand Up @@ -510,6 +510,16 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
set(i,j=lc,value=newval)
}
}
# Implementation for not-join along with by=.EACHI, #604
if (notjoin && byjoin) {
notjoin = FALSE
if (verbose) {last.started.at=proc.time()[3];cat("not-join called with 'by=.EACHI'; Replacing !i with i=setdiff(x,i) ...");flush.console()}
i = setdiff_(x, i, rightcols, leftcols) # part of #547
if (verbose) {cat("done in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
setnames(i, names(origi)[leftcols])
setattr(i, 'sorted', names(i)) # since 'x' has key set, this'll always be sorted
origi = i
}
f__ = integer(nrow(i)) # these could be returned as a list from bmerge?
len__ = integer(nrow(i))
allLen1 = logical(1)
Expand Down
19 changes: 12 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,15 +154,20 @@ We moved from R-Forge to GitHub on 9 June 2014, including history.
26. `.N` is now available in `i`, [FR#724](https://github.com/Rdatatable/data.table/issues/724). Thanks to newbie indirectly [here](http://stackoverflow.com/a/24649115/403310) and Farrel directly [here](http://stackoverflow.com/questions/24685421/how-do-you-extract-a-few-random-rows-from-a-data-table-on-the-fly).

27. `.()` can now be used in `j` and is identical to `list()`, for consistency with `i`.
```R
DT[,list(MySum=sum(B)),by=...]
DT[,.(MySum=sum(B)),by=...] # same
DT[,list(colB,colC,colD)]
DT[,.(colB,colC,colD)] # same
```
```R
DT[,list(MySum=sum(B)),by=...]
DT[,.(MySum=sum(B)),by=...] # same
DT[,list(colB,colC,colD)]
DT[,.(colB,colC,colD)] # same
```
Similarly, `by=.()` is now a shortcut for `by=list()`, for consistency with `i` and `j`.


28. `by=.EACHI` is now implemented for *not-joins* as well. Closes [#604](https://github.com/Rdatatable/data.table/issues/604). Thanks to Garrett See for filing the FR. As an example:
```R
DT = data.table(x=c(1,1,1,1,2,2,3,4,4,4), y=1:10, key="x")
DT[!J(c(1,4)), sum(y), by=.EACHI] # is equivalent to DT[J(c(2,3)), sum(y), by=.EACHI]
```

#### BUG FIXES

1. `fread()`:
Expand Down
13 changes: 13 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -5001,6 +5001,19 @@ test(1364.15, setdiff_(X, Y, "c", "a"), error="When x's column ('c') is characte
test(1364.16, setdiff_(X, Y), error="setdiff(x,y) requires same number of columns for both x and y")
test(1364.17, setdiff_(X[, list(a)], Y[, list(a)]), data.table(a=c(1,2)))

# not join along with by=.EACHI, #604
DT <- data.table(A=c(1,1,1,2,2,2,2,3,3,4,5,5))[, `:=`(B=as.integer(A), C=c("c", "e", "a", "d"), D=factor(c("c", "e", "a", "d")), E=1:12)]
setkey(DT, A)
test(1365.1, DT[!J(c(2,5)), sum(E), by=.EACHI], DT[J(c(1,3,4)), sum(E), by=.EACHI])
setkey(DT, B)
test(1365.2, DT[!J(c(4:5)), list(.N, sum(E)), by=.EACHI], DT[J(1:3), list(.N, sum(E)), by=.EACHI])
setkey(DT, C)
test(1365.3, copy(DT)[!"c", f := .N, by=.EACHI], copy(DT)[c("a", "d", "e"), f := .N, by=.EACHI])
setkey(DT, D)
test(1365.4, DT[!J(factor("c")), .N, by=.EACHI], DT[J(factor(c("a", "d", "e"))), .N, by=.EACHI])
test(1365.5, DT[!"c", lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")], DT[c("a", "d", "e"), lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")])


##########################


Expand Down

0 comments on commit 569fbf3

Please sign in to comment.