the demo("bench-merge") cannot be run for various reasons #1487

knbknb · 2015-10-30T09:03:13Z

This command:

demo(package = .packages(all.available = TRUE))

yields

Demos in package ‘dplyr’:
bench-merge                         Benchmark merging between R and python
bench-rbind                         Benchmark various flavours of rbind
bench-set                           Benchmark set operations on data frames

However, demo "bench-merge" cannot be run by an ordinary user, at least not by me.
(The 2 other demos work properly, though).

I was able to almost make bench-merge run.

I know some python, so I already had a working pandas module installed.
In R, First I had to install some missing packages, microbenchmark. R told me what's needed.
Then I had to create a subdirectory demo/pandas in the package directory
I had to issue a setwd("${r_pkg_directory}/dplyr/demo/"), because this dir did not exist.
Then I had to clone the git repository demo/pandas/bench_merge.py , because this .py file does not get installed by install.packages("dplyr"). The I copied pandas.py from the cloned repo to ${r_pkg_directory}/dplyr/demo/pandas.
I also installed the development version of dplyr because I hoped that would give me all missing files.
Then I was able to run the demo, but now R segfaults.

I think the easiest workaround would be to change the "description" line from

 Benchmark merging between R and python

to something like

 Benchmark merging between R and python (internal demo, for developers only)

Some info about my computing environment.

packageVersion("dplyr")
[1] ‘0.4.3.9000’

R> getwd()
[1] "/home/knb/code/git/dplyr/demo"
R> demo("bench-merge")


    demo(bench-merge)
    ---- ~~~~~~~~~~~

Type     to start : 

R> # Compare base, data table, dplyr and pandas
R> #
R> # To install pandas on OS X:
R> # * brew update && brew install python
R> # * pip install --upgrade setuptools
R> # * pip install --upgrade pip
R> # * pip install pandas
R> 
R> library(dplyr)

R> library(data.table)

R> library(microbenchmark)

R> library(reshape2)

R> set.seed(1014)

R> # Generate sample data ---------------------------------------------------------
R> 
R> random_strings <- function(n, m) {
+   mat <- matrix(sample(letters, m * n, rep = TRUE), ncol = m)
+   apply(mat, 1, paste, collapse = "")
+ }

R> N <- 10000

R> indices  <- random_strings(N, 10)

R> indices2 <- random_strings(N, 10)

R> left <- data.frame(
+   key = rep(indices[1:8000], 10),
+   key2 = rep(indices2[1:8000], 10),
+   value = rnorm(80000)
+ )

R> right <- data.frame(
+   key = indices[2001:10000],
+   key2 = indices2[2001:10000],
+   value2 = rnorm(8000)
+ )

R> write.csv(left, "pandas/left.csv", row.names = FALSE)

R> write.csv(right, "pandas/right.csv", row.names = FALSE)

R> # Equivalent functions for each technique --------------------------------------
R> 
R> base <- list(
+   setup = function(x, y) list(x = x, y = y),
+   
+   left  = function(x, y) base::merge(x, y, all.x = TRUE),
+   right = function(x, y) base::merge(x, y, all.y = TRUE),
+   inner = function(x, y) base::merge(x, y)
+ )

R> data.table <- list(
+   setup = function(x, y) {
+     list(
+       x = data.table(x, key = c("key", "key2")),
+       y = data.table(y, key = c("key", "key2"))
+     )
+   },
+   
+   left  = function(x, y) x[y],
+   right = function(x, y) y[x],
+   inner = function(x, y) merge(x, y, all = FALSE)
+ )

R> dplyr <- list(
+   setup = function(x, y) list(x = x, y = y),
+   
+   left  = function(x, y) left_join(x, y, by = c("key", "key2")),
+   right = function(x, y) NULL,
+   inner = function(x, y) inner_join(x, y, by = c("key", "key2"))
+ )

R> techniques <- list(base = base, data.table = data.table, dplyr = dplyr)

R> # Aggregate results ------------------------------------------------------------
R> 
R> niter <- 10

R> r <- lapply(names(techniques), function(nm) {
+   tech <- techniques[[nm]]
+   df <- tech$setup(left, right)
+   m <- microbenchmark(
+     left = tech$left(df$x, df$y),
+     right = tech$right(df$x, df$y),
+     inner = tech$inner(df$x, df$y),
+     times = niter
+   )
+   
+   means <- tapply(m$time, m$expr, FUN = mean) / 1e9
+   data.frame(type = names(means), mean = means, tech = nm, 
+     row.names = NULL, stringsAsFactors = FALSE)
+ })

 *** caught segfault ***
address 0x2710, cause 'memory not mapped'

Traceback:
 1: .Call("dplyr_left_join_impl", PACKAGE = "dplyr", x, y, by_x,     by_y)
 2: left_join_impl(x, y, by$x, by$y)
 3: left_join.tbl_df(tbl_df(x), y, by = by, copy = copy, ...)
 4: left_join(tbl_df(x), y, by = by, copy = copy, ...)
 5: as.data.frame(left_join(tbl_df(x), y, by = by, copy = copy, ...))
 6: left_join.data.frame(x, y, by = c("key", "key2"))
 7: left_join(x, y, by = c("key", "key2"))
 8: tech$left(df$x, df$y)
 9: microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x,     df$y), inner = tech$inner(df$x, df$y), times = niter)
10: FUN(X[[i]], ...)
11: lapply(names(techniques), function(nm) {    tech <- techniques[[nm]]    df <- tech$setup(left, right)    m <- microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x,         df$y), inner = tech$inner(df$x, df$y), times = niter)    means <- tapply(m$time, m$expr, FUN = mean)/1e+09    data.frame(type = names(means), mean = means, tech = nm,         row.names = NULL, stringsAsFactors = FALSE)})
12: eval(expr, envir, enclos)
13: eval(ei, envir)
14: withVisible(eval(ei, envir))
15: source(available, echo = echo, max.deparse.length = Inf, keep.source = TRUE,     encoding = encoding)
16: demo("bench-merge")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 2
Warning messages:
1: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
2: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
3: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
4: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
5: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
6: In inner_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
7: In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector
8: In left_join_impl(x, y, by$x, by$y) :
  joining factors with different levels, coercing to character vector

R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] colorout_1.1-1

The text was updated successfully, but these errors were encountered:

hadley · 2016-03-01T21:49:16Z

Fixed in eb45d04

lock · 2018-09-16T16:26:16Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

hadley added bug an unexpected problem or unintended behavior documentation and removed bug an unexpected problem or unintended behavior labels Mar 1, 2016

hadley added this to the 0.5 milestone Mar 1, 2016

hadley closed this as completed Mar 1, 2016

krlmlr added documentation and removed documentation labels Mar 20, 2018

lock bot locked and limited conversation to collaborators Sep 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the demo("bench-merge") cannot be run for various reasons #1487

the demo("bench-merge") cannot be run for various reasons #1487

knbknb commented Oct 30, 2015

hadley commented Mar 1, 2016

lock bot commented Sep 16, 2018

the demo("bench-merge") cannot be run for various reasons #1487

the demo("bench-merge") cannot be run for various reasons #1487

Comments

knbknb commented Oct 30, 2015

I think the easiest workaround would be to change the "description" line from

hadley commented Mar 1, 2016

lock bot commented Sep 16, 2018