Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dcast not limited to casting one column #739

Closed
arunsrinivasan opened this issue Jul 21, 2014 · 3 comments
Closed

dcast not limited to casting one column #739

arunsrinivasan opened this issue Jul 21, 2014 · 3 comments
Assignees
Milestone

Comments

@arunsrinivasan
Copy link
Member

Here's a minimal reproducible example to illustrate the issue (thanks to Steve Miller for the email exchanges):

require(reshape2)
require(data.table)
dt <- data.table(aa=c(1,1,1,2,2), bb=letters[1:5], cc=11:15, dd=letters[20:24])
#    aa bb cc dd
#1:  1  a 11  t
#2:  1  b 12  u
#3:  1  c 13  v
#4:  2  d 14  w
#5:  2  e 15  x

Now, what we'd like to do is to cast with the formula aa ~ bb and the rest of the columns should be cast wide. The issue is not that we've to use melt on the data set, but that melt will coerce the integer type to character.

dcast.data.table(melt(dt, id=1:2), aa ~ bb+variable, value.var="value")
#    aa a_cc a_dd b_cc b_dd c_cc c_dd d_cc d_dd e_cc e_dd
#1:  1   11    t   12    u   13    v   NA   NA   NA   NA
#2:  2   NA   NA   NA   NA   NA   NA   14    w   15    x

That could be quite frustrating 1) on large data, it could take considerable amount of time for character conversion, especially when there are many unique values. 2) And after casting one has to convert the required columns back to original type, which is very much unnecessary.

melt and dcast in data.table are (re)implemented with data dimensions large enough where even these type conversions could be costly, in mind. Not to mention the annoyance in having to get the types back.

What should be possible for data.tables is:

dcast.data.table(dt, aa ~ bb, value.var=c("cc", "dd"))

As simple as that. Probably there are some concerns that comes up later, but we'll address them as and when.

Needs to be done carefully along with #716.

@arunsrinivasan
Copy link
Member Author

Implemented in commit fc753c2 in 1.9.8. Will merge and add news later.

@arunsrinivasan
Copy link
Member Author

Refer to #716 for the list of changes to be made before closing.

@arunsrinivasan arunsrinivasan modified the milestones: v1.9.6, v1.9.8 Mar 5, 2015
@arunsrinivasan
Copy link
Member Author

Done.

dcast.data.table(dt, aa ~ bb, value.var=c("cc", "dd"))
#    aa a_cc b_cc c_cc d_cc e_cc a_dd b_dd c_dd d_dd e_dd
# 1:  1   11   12   13   NA   NA    t    u    v   NA   NA
# 2:  2   NA   NA   NA   14   15   NA   NA   NA    w    x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant