Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Selecting columns from a grouped_df with [ results in lost grouping #398

Closed
wch opened this issue Apr 17, 2014 · 4 comments
Closed

Selecting columns from a grouped_df with [ results in lost grouping #398

wch opened this issue Apr 17, 2014 · 4 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@wch
Copy link
Member

wch commented Apr 17, 2014

m <- mtcars %>% group_by(cyl)

# Selecting rows keeps grouping
m[1:3, ]
# Source: local data frame [3 x 11]
# Groups: cyl
# 
#                mpg cyl disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
# Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4    1

# Selecting columns loses grouping
m[1:3, 1:3]
# Source: local data frame [3 x 3]
# Groups: 
# 
#                mpg cyl disp
# Mazda RX4     21.0   6  160
# Mazda RX4 Wag 21.0   6  160
# Datsun 710    22.8   4  108

# All attributes are lost
str(m[1:3, 1:3])
# Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':  3 obs. of  3 variables:
#  $ mpg : num  21 21 22.8
#  $ cyl : num  6 6 4
#  $ disp: num  160 160 108

This is especially problematic because R crashes if you use do() on this grouped_df which has no groups:

m[, 1:3] %>% do(mpg = mean(.$mpg))
# [segfault]
@wch wch changed the title Selecting columns from a grouped tbl_df with [ results in lost grouping Selecting columns from a grouped_df with [ results in lost grouping Apr 17, 2014
@romainfrancois
Copy link
Member

I've put some protection in place. We now get an error:

> m <- mtcars %>% group_by(cyl)
> m[1:3,1:3] %>% do(mpg = mean(.$mpg))
Erreur : no variables to group by

@wch
Copy link
Member Author

wch commented Apr 17, 2014

It might make sense to keep the groups of any grouping columns that are selected (cyl in this case), but drop the groups of any grouping columns that aren't selected. If no grouping columns are selected, you could drop the grouped_df class.

I realize that select() works differently -- it keeps the grouping columns, even if the user doesn't ask for them specifically.

@romainfrancois
Copy link
Member

We'd have to write a [.grouped_df for this. Not sure we want to go there.

@hadley
Copy link
Member

hadley commented Apr 21, 2014

I think we should probably protect attributes in [ methods (we wouldn't encourage users to use these methods but they are useful for developers). I'll add the code

@hadley hadley modified the milestones: 0.3.1, 0.3 Aug 1, 2014
@hadley hadley self-assigned this Aug 1, 2014
@hadley hadley closed this as completed in 7deb5ab Aug 12, 2014
krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants