Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[.data.table accept both by and keyby non-missing #1104

Closed
jangorecki opened this issue Apr 4, 2015 · 6 comments
Closed

[.data.table accept both by and keyby non-missing #1104

jangorecki opened this issue Apr 4, 2015 · 6 comments

Comments

@jangorecki
Copy link
Member

I would like do conditional sort while making aggregation, I think [.data.table could be less restricted and check only if one of by and keyby is not null.

library(data.table)
dt <- data.table(a = 1:10, b = 1:5)
do_sort <- FALSE
dt[,j = .(a=sum(a)),
   by = if(!do_sort) b else NULL,
   keyby = if(do_sort) b else NULL]
# Error in `[.data.table`(dt, , j = .(a = sum(a)), by = if (!do_sort) b else NULL,  : 
#   Provide either 'by' or 'keyby' but not both
@rsaporta
Copy link
Contributor

rsaporta commented Apr 6, 2015

why not put the conditional outside of the data.table call? In the given example, there is not much harm, but it might open up to other errors requiring many more checks.

@jangorecki
Copy link
Member Author

Then it would be difficult to use chaining which is not exposed in that example.

@jangorecki
Copy link
Member Author

jangorecki commented Apr 10, 2015

@rsaporta see line #L409 - by is already cloned from keyby, so it looks like a matter of handling keyby only.
At the first glance setting if(missing(keyby)) keyby <- NULL and then change all following missing(keyby) to is.null(keyby) would solve that issue.

hmm, looks like it can be done even easier just by keyby <- substitute(): SO comment.

I may prepare PR if there is a chance for that to be accepted.

@jangorecki
Copy link
Member Author

example code which suffers from lack of that feature is my dev version of split.data.table #1389

...
if (!missing(keyby)) {
    if (!missing(by)) stop("you must provide 'by' or 'keyby', not both")
    do.key = TRUE
    by = keyby
} else if (!missing(by)) {
    do.key = FALSE
}
tmp = if (do.key) {
    x[, list(.ll.tech.split=list(.SD)), keyby=by, .SDcols=if(drop) setdiff(names(x), by) else names(x)]
} else x[, list(.ll.tech.split=list(.SD)), by=by, .SDcols=if(drop) setdiff(names(x), by) else names(x)]
...

@jangorecki
Copy link
Member Author

Alternatively keyby argument in [.data.table could accept logical scalar - a flag if key should be set on groups provided in by argument. Then grouping columns can be controlled with by and sorting results with keyby.

@jangorecki
Copy link
Member Author

closing as duplicate of newly created #4307 which is well defining requested behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants