Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for group argument #23

Closed
psychelzh opened this issue Mar 28, 2021 · 3 comments
Closed

Add support for group argument #23

psychelzh opened this issue Mar 28, 2021 · 3 comments
Assignees
Milestone

Comments

@psychelzh
Copy link
Owner

psychelzh commented Mar 28, 2021

Currently, data are all processed subject by subject, which can be very slow in processing when subjects grows. We can assume that one should stack all the data from one game into a tall data.frame and add a group variable to it, which makes a group-wise computation possible. Now we just use nested data to do group-wise computation, which is rather slow. See the following:

library(tidyverse)
library(collapse)
#> collapse 1.5.3, see ?`collapse-package` or ?`collapse-documentation`
#> Note: stats::D  ->  D.expression, D.call, D.name
#> 
#> Attaching package: 'collapse'
#> The following object is masked from 'package:stats':
#> 
#>     D
data <- tibble(id = rep(1:1000, each = 100), y = rpois(100*1000, 2))
bench::mark(
    group_wise = data %>% group_by(id) %>% summarise(y = mean(y)),
    group_nest = data %>% group_nest(id) %>% mutate(y = map_dbl(data, ~ mean(.x$y)), .keep = "unused"),
    collapse = collap(data, y ~ id)
)
#> # A tibble: 3 x 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 group_wise  10.96ms   13.1ms      73.6    4.16MB     6.90
#> 2 group_nest  43.92ms   47.5ms      19.8    2.73MB     2.19
#> 3 collapse     1.33ms    1.6ms     578.     1.98MB    41.5

Created on 2021-03-28 by the reprex package (v1.0.0)

Obviously, the "collapse" package does a great job, but its syntax is not very well to understand and apply.

image

@psychelzh
Copy link
Owner Author

For the use of "collapse" package can be not easy, I will implement it as much as possible but not thoroughly.

@psychelzh
Copy link
Owner Author

psychelzh commented Mar 28, 2021

A second thought is that whether this group input can be NULL, i.e., no grouping at all. A very easy to do this is to add one dummy column (named "id") if group is NULL.

[Wrong] This thing is not easy, for collapse::collap() does not support this. However, if collapse::fgroup_by() and others support programming, things will be easy. So currently, let's say group cannot be NULL then. For I want to use collapse package as much as I can.

@psychelzh
Copy link
Owner Author

psychelzh commented Mar 30, 2021

Reconsider the use of "collapse" package:

For two or more days, I have struggled to use "collapse" package. To my dismay, the grammar the package used is daunting for me. Thus, I should take the advice from Hadley (Advanced R):

It’s easy to get caught up in trying to remove all bottlenecks. Don’t! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don’t spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and optimise only up to that goal. This means you will not eliminate all bottlenecks. Some you will not get to because you’ve met your goal. Others you may need to pass over and accept either because there is no quick and easy solution or because the code is already well optimised and no significant improvement is possible. Accept these possibilities and move on to the next candidate.

To get those seconds of improvements, too much time will be consumed, which is actually not deserved. But with the introduction of grouping, it will be quick enough.

@psychelzh psychelzh self-assigned this Mar 30, 2021
@psychelzh psychelzh added this to the 0.3.0 milestone Mar 30, 2021
psychelzh added a commit that referenced this issue Mar 30, 2021
The collapse package is promising in future, but not now.

See #23 for details

Signed-off-by: Liang Zhang <psychelzh@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant