Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyrs "tbl_df" and "tbl" classes messing up data.table := operator (and possibly other functionalities) #1114

Closed
DavidArenburg opened this issue Apr 13, 2015 · 8 comments

Comments

@DavidArenburg
Copy link
Member

It seems like devel version is now keeping classes after setDT operation, but that messes everything up afterwards if converting from a dplyr object. I'm not only talking about the lame printing method, but even := doesn't work anymore

library(data.table) # v 1.9.5
library(dplyr) # v 0.4.1

TBL <- mtcars %>% 
              group_by(cyl) %>%
              summarise(mpg = sum(mpg))
class(TBL)
## [1] "tbl_df"     "tbl"        "data.table" "data.frame"

setDT(TBL)[, test := mpg]
# Error in `:=`(test, mpg) : 
#   Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
is.data.table(TBL)
## [1] TRUE 
@arunsrinivasan
Copy link
Member

Thanks David, Jan already filed this. Duplicate of #1078.

@DavidArenburg
Copy link
Member Author

@arunsrinivasan I saw this post, but I don't think it emphasises that all the data.table functionality ceases to work, which seems like a very urgent bug (to me atleast).

In other words, I couldn't care less about creating a special method for tbl_df rather removing it completely.

Thanks

@arunsrinivasan
Copy link
Member

David, then it's better to add to that post rather than creating a new issue (which essentially is the same). Now, this issue would be tagged in the other one. So this can be closed.

@jangorecki
Copy link
Member

@DavidArenburg You probably know you can do setDT <- function(x) setattr(x,"class",c("data.table","data.frame")) to force overwrite class to data.table.
I've made PR which prepend data.table class to be first but it breaks the current tests to keep custom defined class.
We can make setDT a S3 generic and prepend data.table only on particular classes like tbl_df by simply setDT.tbl_df <- function.... Then it should handle the issue and also keep current tests passing. Maybe you could make some contribution to data.table? :) You already doing a great work on SO!

@DavidArenburg
Copy link
Member Author

@jangorecki, Yes I am aware of that possibility, though I don't think it is optimal to override setDT each time, neither it is feasible for other users who aren't aware of this possibility.

I'm also would prefer setDT not to be generic as I don't want it to mess around with method dispatching which will hurt performance. I also don't think it is practical, as packages are popping up like mushrooms after the rain and it is not feasible to write methods for every newly made up class.

I do think, that either removing all classes except data.frame and data.table or moving data.table class ahead should solve the issue (and I can't wait for either to be implemented because the current situation is very annoying due to dplyrs great popularity among R/SO users).

Regarding your last point, I would love to contribute but I'm severely lacking the knowledge/qualification in order to so.

Btw, thanks for your hard work, Jan

@jangorecki
Copy link
Member

Good point on performance on the class dispatching, I forget about that.
It is good to challenge own knowledge and qualification - you can collect valuable feedback, implement, squash, re-PR. Also your SO answers are definitely better than mine :)
I will prepare a non-S3 way which would be sufficient - a conditional pretending data.table class so should not affect other custom classes.
@DavidArenburg as you are more dplyr aware than me can you just list the classes which dplyr can produce so I will not miss any. There are at least few: tbl_df, tbl_dt,tbl,data_frame??.

@arunsrinivasan
Copy link
Member

Jan, I'd not put work into it until a resolution has been reached (which I'm thinking about as well). There seem to be at least these possibilities I could think of:

  1. Strip other classes on as.data.table() and setDT().
  2. Add an extra argument retain.classes = FALSE.
  3. Add data.table class to the beginning (perhaps combine with 2).

I'm not for stripping other classes automatically.
This may take some time (as I'd like to ask Matt when he has some time as well).

@jangorecki
Copy link
Member

Sure, no problem.
I was also thinking about retain.classes which would be the simplest way from user perspective. But it will not work for all the current documentation and codes as we would need to update it for extra arg.
Instead we could add extra argument prepend.classes = c("tbl_df","tbl_dt","tbl") which by default would handle mentioned cases - no updates to codes/docs required.
Or even prepend.classes = getOption("datatable.setDT.prepend.classes") and options("datatable.setDT.prepend.classes" = c("tbl_df","tbl_dt","tbl")) in .onLoad().
This way it can be dynamically extended depending on the user need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants