You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Demos in package ‘dplyr’:
bench-merge Benchmark merging between R and python
bench-rbind Benchmark various flavours of rbind
bench-set Benchmark set operations on data frames
However, demo "bench-merge" cannot be run by an ordinary user, at least not by me.
(The 2 other demos work properly, though).
I was able to almost make bench-merge run.
I know some python, so I already had a working pandas module installed.
In R, First I had to install some missing packages, microbenchmark. R told me what's needed.
Then I had to create a subdirectory demo/pandas in the package directory
I had to issue a setwd("${r_pkg_directory}/dplyr/demo/"), because this dir did not exist.
Then I had to clone the git repository demo/pandas/bench_merge.py , because this .py file does not get installed by install.packages("dplyr"). The I copied pandas.py from the cloned repo to ${r_pkg_directory}/dplyr/demo/pandas.
I also installed the development version of dplyr because I hoped that would give me all missing files.
Then I was able to run the demo, but now R segfaults.
I think the easiest workaround would be to change the "description" line from
Benchmark merging between R and python
to something like
Benchmark merging between R and python (internal demo, for developers only)
Some info about my computing environment.
packageVersion("dplyr")
[1] ‘0.4.3.9000’
R> getwd()
[1] "/home/knb/code/git/dplyr/demo"
R> demo("bench-merge")
demo(bench-merge)
---- ~~~~~~~~~~~
Type to start :
R> # Compare base, data table, dplyr and pandas
R> #
R> # To install pandas on OS X:
R> # * brew update && brew install python
R> # * pip install --upgrade setuptools
R> # * pip install --upgrade pip
R> # * pip install pandas
R>
R> library(dplyr)
R> library(data.table)
R> library(microbenchmark)
R> library(reshape2)
R> set.seed(1014)
R> # Generate sample data ---------------------------------------------------------
R>
R> random_strings <- function(n, m) {
+ mat <- matrix(sample(letters, m * n, rep = TRUE), ncol = m)
+ apply(mat, 1, paste, collapse = "")
+ }
R> N <- 10000
R> indices <- random_strings(N, 10)
R> indices2 <- random_strings(N, 10)
R> left <- data.frame(
+ key = rep(indices[1:8000], 10),
+ key2 = rep(indices2[1:8000], 10),
+ value = rnorm(80000)
+ )
R> right <- data.frame(
+ key = indices[2001:10000],
+ key2 = indices2[2001:10000],
+ value2 = rnorm(8000)
+ )
R> write.csv(left, "pandas/left.csv", row.names = FALSE)
R> write.csv(right, "pandas/right.csv", row.names = FALSE)
R> # Equivalent functions for each technique --------------------------------------
R>
R> base <- list(
+ setup = function(x, y) list(x = x, y = y),
+
+ left = function(x, y) base::merge(x, y, all.x = TRUE),
+ right = function(x, y) base::merge(x, y, all.y = TRUE),
+ inner = function(x, y) base::merge(x, y)
+ )
R> data.table <- list(
+ setup = function(x, y) {
+ list(
+ x = data.table(x, key = c("key", "key2")),
+ y = data.table(y, key = c("key", "key2"))
+ )
+ },
+
+ left = function(x, y) x[y],
+ right = function(x, y) y[x],
+ inner = function(x, y) merge(x, y, all = FALSE)
+ )
R> dplyr <- list(
+ setup = function(x, y) list(x = x, y = y),
+
+ left = function(x, y) left_join(x, y, by = c("key", "key2")),
+ right = function(x, y) NULL,
+ inner = function(x, y) inner_join(x, y, by = c("key", "key2"))
+ )
R> techniques <- list(base = base, data.table = data.table, dplyr = dplyr)
R> # Aggregate results ------------------------------------------------------------
R>
R> niter <- 10
R> r <- lapply(names(techniques), function(nm) {
+ tech <- techniques[[nm]]
+ df <- tech$setup(left, right)
+ m <- microbenchmark(
+ left = tech$left(df$x, df$y),
+ right = tech$right(df$x, df$y),
+ inner = tech$inner(df$x, df$y),
+ times = niter
+ )
+
+ means <- tapply(m$time, m$expr, FUN = mean) / 1e9
+ data.frame(type = names(means), mean = means, tech = nm,
+ row.names = NULL, stringsAsFactors = FALSE)
+ })
*** caught segfault ***
address 0x2710, cause 'memory not mapped'
Traceback:
1: .Call("dplyr_left_join_impl", PACKAGE = "dplyr", x, y, by_x, by_y)
2: left_join_impl(x, y, by$x, by$y)
3: left_join.tbl_df(tbl_df(x), y, by = by, copy = copy, ...)
4: left_join(tbl_df(x), y, by = by, copy = copy, ...)
5: as.data.frame(left_join(tbl_df(x), y, by = by, copy = copy, ...))
6: left_join.data.frame(x, y, by = c("key", "key2"))
7: left_join(x, y, by = c("key", "key2"))
8: tech$left(df$x, df$y)
9: microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x, df$y), inner = tech$inner(df$x, df$y), times = niter)
10: FUN(X[[i]], ...)
11: lapply(names(techniques), function(nm) { tech <- techniques[[nm]] df <- tech$setup(left, right) m <- microbenchmark(left = tech$left(df$x, df$y), right = tech$right(df$x, df$y), inner = tech$inner(df$x, df$y), times = niter) means <- tapply(m$time, m$expr, FUN = mean)/1e+09 data.frame(type = names(means), mean = means, tech = nm, row.names = NULL, stringsAsFactors = FALSE)})
12: eval(expr, envir, enclos)
13: eval(ei, envir)
14: withVisible(eval(ei, envir))
15: source(available, echo = echo, max.deparse.length = Inf, keep.source = TRUE, encoding = encoding)
16: demo("bench-merge")
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 2
Warning messages:
1: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
2: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
3: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
4: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
5: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
6: In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
7: In left_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
8: In left_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
R version 3.2.2 (2015-08-14)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 15.10
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] colorout_1.1-1
The text was updated successfully, but these errors were encountered:
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/
lockbot
locked and limited conversation to collaborators
Sep 16, 2018
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
This command:
yields
However, demo "bench-merge" cannot be run by an ordinary user, at least not by me.
(The 2 other demos work properly, though).
I was able to almost make bench-merge run.
I know some python, so I already had a working pandas module installed.
In R, First I had to install some missing packages, microbenchmark. R told me what's needed.
Then I had to create a subdirectory demo/pandas in the package directory
I had to issue a setwd("${r_pkg_directory}/dplyr/demo/"), because this dir did not exist.
Then I had to clone the git repository demo/pandas/bench_merge.py , because this .py file does not get installed by install.packages("dplyr"). The I copied pandas.py from the cloned repo to ${r_pkg_directory}/dplyr/demo/pandas.
I also installed the development version of dplyr because I hoped that would give me all missing files.
Then I was able to run the demo, but now R segfaults.
I think the easiest workaround would be to change the "description" line from
to something like
Some info about my computing environment.
The text was updated successfully, but these errors were encountered: