`[.data.table` accept both `by` and `keyby` non-missing #1104

jangorecki · 2015-04-04T00:31:14Z

I would like do conditional sort while making aggregation, I think [.data.table could be less restricted and check only if one of by and keyby is not null.

library(data.table)
dt <- data.table(a = 1:10, b = 1:5)
do_sort <- FALSE
dt[,j = .(a=sum(a)),
   by = if(!do_sort) b else NULL,
   keyby = if(do_sort) b else NULL]
# Error in `[.data.table`(dt, , j = .(a = sum(a)), by = if (!do_sort) b else NULL,  : 
#   Provide either 'by' or 'keyby' but not both

The text was updated successfully, but these errors were encountered:

rsaporta · 2015-04-06T17:35:01Z

why not put the conditional outside of the data.table call? In the given example, there is not much harm, but it might open up to other errors requiring many more checks.

jangorecki · 2015-04-06T17:56:00Z

Then it would be difficult to use chaining which is not exposed in that example.

jangorecki · 2015-04-10T22:09:53Z

@rsaporta see line #L409 - by is already cloned from keyby, so it looks like a matter of handling keyby only.
~~At the first glance setting if(missing(keyby)) keyby <- NULL and then change all following missing(keyby) to is.null(keyby) would solve that issue.~~

hmm, looks like it can be done even easier just by keyby <- substitute(): SO comment.

I may prepare PR if there is a chance for that to be accepted.

jangorecki · 2016-03-11T22:01:19Z

example code which suffers from lack of that feature is my dev version of split.data.table #1389

...
if (!missing(keyby)) {
    if (!missing(by)) stop("you must provide 'by' or 'keyby', not both")
    do.key = TRUE
    by = keyby
} else if (!missing(by)) {
    do.key = FALSE
}
tmp = if (do.key) {
    x[, list(.ll.tech.split=list(.SD)), keyby=by, .SDcols=if(drop) setdiff(names(x), by) else names(x)]
} else x[, list(.ll.tech.split=list(.SD)), by=by, .SDcols=if(drop) setdiff(names(x), by) else names(x)]
...

jangorecki · 2016-03-12T22:28:33Z

Alternatively keyby argument in [.data.table could accept logical scalar - a flag if key should be set on groups provided in by argument. Then grouping columns can be controlled with by and sorting results with keyby.

jangorecki · 2020-03-17T15:27:01Z

closing as duplicate of newly created #4307 which is well defining requested behaviour

jangorecki mentioned this issue Apr 13, 2016

keyby option for unique #1245

Open

jangorecki mentioned this issue Mar 17, 2020

keyby=TRUE/FALSE together with by= #4307

Closed

jangorecki added the duplicate label Mar 17, 2020

jangorecki closed this as completed Mar 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`[.data.table` accept both `by` and `keyby` non-missing #1104

`[.data.table` accept both `by` and `keyby` non-missing #1104

jangorecki commented Apr 4, 2015

rsaporta commented Apr 6, 2015

jangorecki commented Apr 6, 2015

jangorecki commented Apr 10, 2015 •

edited

Loading

jangorecki commented Mar 11, 2016

jangorecki commented Mar 12, 2016

jangorecki commented Mar 17, 2020

[.data.table accept both by and keyby non-missing #1104

[.data.table accept both by and keyby non-missing #1104

Comments

jangorecki commented Apr 4, 2015

rsaporta commented Apr 6, 2015

jangorecki commented Apr 6, 2015

jangorecki commented Apr 10, 2015 • edited Loading

jangorecki commented Mar 11, 2016

jangorecki commented Mar 12, 2016

jangorecki commented Mar 17, 2020

`[.data.table` accept both `by` and `keyby` non-missing #1104

`[.data.table` accept both `by` and `keyby` non-missing #1104

jangorecki commented Apr 10, 2015 •

edited

Loading