Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-Forge #5297] Speed in rolling joins #538

Closed
arunsrinivasan opened this issue Jun 8, 2014 · 1 comment
Closed

[R-Forge #5297] Speed in rolling joins #538

arunsrinivasan opened this issue Jun 8, 2014 · 1 comment
Assignees
Milestone

Comments

@arunsrinivasan
Copy link
Member

Submitted by: Michele Carriero; Assigned to: Nobody; R-Forge link

Hello,

is the following difference in time expected or is possible to decrease it?

date<-seq.Date(as.Date("2010-01-01"), as.Date("2014-01-01"), "day")
dt1 <- data.table(date=date,
                  var1=rnorm(length(date)),
                  var2=rnorm(length(date)),
                  var3=rnorm(length(date)),
                  key="date")

date<-seq.Date(as.Date("2010-01-01"), as.Date("2014-01-01"), "month")
dt2 <- data.table(date=date,
                  var4=rnorm(length(date),100),
                  key="date")

> microbenchmark(dt2[dt1, roll=T][, list(date, var1, var2, var4)],dt2[dt1, list(var1, var2, var4), roll=T])
Unit: milliseconds
                                               expr      min       lq   median       uq      max neval
 dt2[dt1, roll = T][, list(date, var1, var2, var4)] 1.564117 1.610941 1.646219 1.689355 2.227991   100
         dt2[dt1, list(var1, var2, var4), roll = T] 2.811368 2.943662 3.033141 3.151162 3.453434   100

The expressions are "equal", using all.equal(,check.attributes=F). In this example it's just twice slower but it gets larger as the tables increase in size.

@arunsrinivasan arunsrinivasan added this to the v1.9.4 milestone Jun 19, 2014
@arunsrinivasan
Copy link
Member Author

By implementing FR #371 in commit 2679047 by Matt, this issue has been automatically fixed. Here's the microbenchmark timings now:

Unit: milliseconds
                                               expr      min       lq   median
 dt2[dt1, roll = T][, list(date, var1, var2, var4)] 3.810468 3.953549 4.176619
         dt2[dt1, list(var1, var2, var4), roll = T] 1.958930 2.043809 2.108974
       uq      max neval
 4.663930 8.126353   100
 2.311414 9.342197   100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants