Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbind on grouped data produces a "nested data frame" #2138

Closed
mkuhn opened this issue Sep 22, 2016 · 15 comments
Closed

rbind on grouped data produces a "nested data frame" #2138

mkuhn opened this issue Sep 22, 2016 · 15 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@mkuhn
Copy link
Contributor

mkuhn commented Sep 22, 2016

Digging around possible causes for this strange behaviour led me to bind_rows, but it would still be great if the following bug could be resolved or a warning to use bind_rows could be printed. Code that worked fine suddenly failed when I changed the grouping variable, leading to an output of rbind like:

> rbind(d2, d3)
   y         foo       z        
d2 Integer,4 Numeric,4 Numeric,4
d3 Integer,8 Integer,8 Numeric,8

Example:

library(dplyr)

d1 <- data_frame(x = 1:8, y = rep(1:4, each = 2), z = rep(1:4, 2))

d2 <- d1 %>% group_by(y) %>% summarise(foo = mean(x), z = 0)
d3 <- d1 %>% group_by(y, z) %>% summarise(foo = mean(x))

# doesn't work
rbind(d2, d3)

# works
rbind(d2, d3 %>% ungroup())

# works
rbind(d3, d3)

# works
rbind(d2, d2)

# works
bind_rows(d2, d3)


d2 <- d1 %>% group_by(y) %>% summarise(foo = mean(x))
d3 <- d1 %>% group_by(y, z) %>% summarise(foo = mean(x)) %>% select(-z)

# doesn't work
rbind(d2, d3)

# works 
rbind(d2, d3 %>% ungroup())
@Fablepongiste
Copy link

Similarly, rbind between a data.frame and a grouped tbl_df data.frame does not return expected result.

df1 <- data.frame ("a" = sample(10), "b" = sample(10))
df2 <- group_by(data.frame ("a" = sample(10, replace= TRUE), "b" = sample(10)), a)

rbind(df1, df2)
a b
df1 Integer,10 Integer,10
df2 Integer,10 Integer,10

Could someone let us know if that wanted or not ? Not like this in 0.4 and very weird result according to me

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2016

It does work as expected with bind_rows(), though. @hadley: Do you think there's a good way to make rbind() do the right thing for grouped df-s?

@hadley
Copy link
Member

hadley commented Nov 7, 2016

I think we can probably make it do better, either by defining a rbind() or rbind2() method.

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2016

That would have to be S4, right? Could you please tag this as appropriate (bug/feature)?

@hadley hadley added the bug an unexpected problem or unintended behavior label Nov 7, 2016
@hadley
Copy link
Member

hadley commented Nov 7, 2016

I'm not sure on the S3/S4 issue - I'd need to look into the dispatch issues in more detail.

Somewhat related to r-spatial/sf#49

@krlmlr
Copy link
Member

krlmlr commented Feb 10, 2017

Could do the same what data.table is doing: see #606 (comment), also tidyverse/tibble#34 (comment) for the related tibble issue.

CC @billdenney.

@hadley
Copy link
Member

hadley commented Apr 17, 2017

We had to revert this due to #2667 — it caused more problems than it fixed.

@lionel-
Copy link
Member

lionel- commented Apr 17, 2017

@hadley looks like we should also remove the grouped_df method? it's probably better to lose group information than getting a matrix when combining a grouped_df with a tbl_df or data.frame.

@billdenney
Copy link
Contributor

Could an option be added to cbind and rbind so that they perform as described in this issue? Otherwise, there will be a lot of code that looks like:

rbind(as.data.frame(df1), as.data.frame(df2))

@lionel-
Copy link
Member

lionel- commented Apr 17, 2017

@billdenney we're not in control of the cbind and rbind generics.

@hadley
Copy link
Member

hadley commented Apr 17, 2017

@billdenney Just use bind_rows().

@billdenney
Copy link
Contributor

@hadley bind_rows (and bind_cols) is fair... Is it possible to add a warning reminder cbind and rbind so that they will say something like "cbind may give unexpected results with tbl_df, please use bind_cols" and "rbind may give unexpected results with tbl_df, please use bind_rows"?

(I haven't fully followed the parts about why it's infeasible, but I like warnings as reminders. And, the warning will hopefully help prevent future bug reports like this one. If it's already there-- no worries; I can't easily test the development version in my current environment.)

@lionel-
Copy link
Member

lionel- commented Apr 17, 2017

it is not possible to reliably issue a warning because of the way cbind's and rbind's dispatch mechanism works.

@krlmlr
Copy link
Member

krlmlr commented Apr 17, 2017

Well, we could hack into cbind() and rbind() the way data.table does: #606 (comment), tidyverse/tibble#34 (comment). I have suggested this before, so there may be a reason we're not following this path.

@lionel-
Copy link
Member

lionel- commented Apr 17, 2017

it doesn't look ideal to have both dplyr and data.table hack the base functions. We'll probably get unpredictable results as a function of which objects are supplied and which package was first loaded. Though we could load data.table if it is installed before applying our hack to work around that.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

6 participants