You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a minimal reproducible example to illustrate the issue (thanks to Steve Miller for the email exchanges):
require(reshape2)
require(data.table)
dt<- data.table(aa=c(1,1,1,2,2), bb=letters[1:5], cc=11:15, dd=letters[20:24])
# aa bb cc dd#1: 1 a 11 t#2: 1 b 12 u#3: 1 c 13 v#4: 2 d 14 w#5: 2 e 15 x
Now, what we'd like to do is to cast with the formula aa ~ bb and the rest of the columns should be cast wide. The issue is not that we've to use melt on the data set, but that melt will coerce the integer type to character.
dcast.data.table(melt(dt, id=1:2), aa~bb+variable, value.var="value")
# aa a_cc a_dd b_cc b_dd c_cc c_dd d_cc d_dd e_cc e_dd#1: 1 11 t 12 u 13 v NA NA NA NA#2: 2 NA NA NA NA NA NA 14 w 15 x
That could be quite frustrating 1) on large data, it could take considerable amount of time for character conversion, especially when there are many unique values. 2) And after casting one has to convert the required columns back to original type, which is very much unnecessary.
melt and dcast in data.table are (re)implemented with data dimensions large enough where even these type conversions could be costly, in mind. Not to mention the annoyance in having to get the types back.
dcast.data.table(dt, aa~bb, value.var=c("cc", "dd"))
# aa a_cc b_cc c_cc d_cc e_cc a_dd b_dd c_dd d_dd e_dd# 1: 1 11 12 13 NA NA t u v NA NA# 2: 2 NA NA NA 14 15 NA NA NA w x
Here's a minimal reproducible example to illustrate the issue (thanks to Steve Miller for the email exchanges):
Now, what we'd like to do is to cast with the formula
aa ~ bb
and the rest of the columns should be cast wide. The issue is not that we've to usemelt
on the data set, but thatmelt
will coerce the integer type to character.That could be quite frustrating 1) on large data, it could take considerable amount of time for character conversion, especially when there are many unique values. 2) And after casting one has to convert the required columns back to original type, which is very much unnecessary.
melt
anddcast
in data.table are (re)implemented with data dimensions large enough where even these type conversions could be costly, in mind. Not to mention the annoyance in having to get the types back.What should be possible for
data.table
s is:As simple as that. Probably there are some concerns that comes up later, but we'll address them as and when.
Needs to be done carefully along with #716.
The text was updated successfully, but these errors were encountered: