Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutate each crashes R in newest dplyr #1228

Closed
danielsjf opened this issue Jun 19, 2015 · 10 comments
Closed

Mutate each crashes R in newest dplyr #1228

danielsjf opened this issue Jun 19, 2015 · 10 comments

Comments

@danielsjf
Copy link

Hi,

This piece of code worked before but now crashes my R sessions when the data frame is too large:

require(dplyr)
require(lubridate)

size <- 100000
prices <- data.frame(time=seq(now(),now()+size-1,by=1), A=runif(size),B=runif(size),C=runif(size))

BaseloadPrice <- prices %>%
  ungroup() %>%
  mutate_each(funs(ifelse(.>300,300,.)),-time)

Sessioninfo:

> sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.3.3 dplyr_0.4.2    

loaded via a namespace (and not attached):
 [1] assertthat_0.1 DBI_0.3.1      digest_0.6.8   magrittr_1.5   memoise_0.2.1  parallel_3.1.2
 [7] plyr_1.8.3     R6_2.0.1       Rcpp_0.11.6    stringi_0.4-1  stringr_1.0.0  tools_3.1.2   

Update
This code works for now:

require(dplyr)
require(lubridate)

size <- 100000
prices <- data.frame(time=seq(now(),now()+size-1,by=1), A=runif(size),B=runif(size),C=runif(size))

BaseloadPrice <- prices %>%
  ungroup() %>%
  group_by(time) %>%
  mutate_each(funs(min(.,300)),-time)
@bamcdougall
Copy link

I report same issue: mutate fails when enclosed within a code chunk of an .Rmd file. R script appears to be functioning OK

Additional context: mutating within Slidify framework.

Removed dplyr(0.4.2); executed the following command:
install_version("dplyr", version = "0.4.1", repos = "http://cran.us.r-project.org")

Previous Rmd files running fine within Slidify. Note: I executed an Update from within RStudio to the packages, so a whole list of libraries were updated. Backing out this dplyr(0.4.2) seems to have me back up and running.

@chrishaid
Copy link

FWIW: I'm seeing similar problems (segfaults, memory allocation problems) in project that has been passing wercker and travis CI builds until the new version of dplyr (0.4.2) was released.

@hadley
Copy link
Member

hadley commented Jun 22, 2015

@romainfrancois could you please take a look?

@jennybc
Copy link
Member

jennybc commented Jun 22, 2015

I am having the same trouble. It's in a function that I didn't write and hadn't really reviewed until it caused this segfault on Travis. The function on our end is really ugly and I'm refactoring it.

Ugly thought it may be, the code should run w/o a segfault. This code has been there for several months, with no problem passing tests locally or on Travis.

Here's one of my failed Travis builds:
https://travis-ci.org/jennybc/googlesheets/builds/67873547

FWIW it does seem related to size of the data.frame. When I was trying to debug interactively, I could get the segfault with full input but not with head(input).

@jennybc
Copy link
Member

jennybc commented Jun 22, 2015

I have an unintentional natural experiment where my package builds successfully on Travis in master, which is using the old style of .travis.yml, but I get the segfault above in a branch where I'm experimenting with the newer "R native" .travis.yml. Although dplyr 0.4.2 gets installed in both cases, in my master branch I have managed to mask 0.4.2 with 0.4.1 through my sheer incompetence in writing .travis.yml files.

Sorry I can't be more precise but maybe that gives some information?

@lwjohnst86
Copy link

I have also experienced problems with mutate_each, though I had thought it was due to scale. See the StackOverflow question I asked.

But, a MWE, that simulates my actual dataset I have (44 columns, 500 rows).

library(dplyr)
matrix(runif(44*500), ncol = 44) %>%
  as.data.frame() %>%
  mutate_each(funs(as.numeric(scale(.))))

@shrektan
Copy link

Same problem occurred in my computer, with the code below (same as takje's except larger size):

size <- 1000000
prices <- data.frame(time=seq(now(),now()+size-1,by=1), A=runif(size),B=runif(size),C=runif(size))

BaseloadPrice <- prices %>%
  ungroup() %>%
  mutate_each(funs(ifelse(.>300,300,.)),-time)

Weird, it seems like not occurred every time, but more frequent when the progress bar shows... It may be some problems related to that...

> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.3.3 dplyr_0.4.2    

loaded via a namespace (and not attached):
 [1] plyr_1.8.3     magrittr_1.5   R6_2.0.1       assertthat_0.1
 [5] parallel_3.2.0 DBI_0.3.1      tools_3.2.0    memoise_0.2.1 
 [9] Rcpp_0.11.6    stringi_0.5-2  digest_0.6.8   stringr_1.0.0 

@Rckrd
Copy link

Rckrd commented Jul 2, 2015

Same issue here with a 100k rows dataframe containing several columns with dates stored as strings.

parse_dirty_string_to_date <- function(x) {
     str_replace_all(x, fixed("."),"-") %>% 
     mdy(tz='CET') %>% as.Date(tz='CET')
}
mc %>% mutate(funs(parse_dirty_string_to_date), ends_with("Dt")) 

I tried replacing my mutate_each statements with mutate, but that also crashes R.

mc %>% mutate(ReportDt = parse_dirty_string_to_date(ReportDt),
               TransDt = parse_dirty_string_to_date(TransDt))

this works though

mc$TransDt <- parse_dirty_string_to_date(mc$TransDt)
mc$ReportDt <- parse_dirty_string_to_date(mc$ReportDt)

Sessioninfo

R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252 LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    

attached base packages:
[1] tcltk     stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] sqldf_0.4-10        RSQLite_1.0.0       DBI_0.3.1           gsubfn_0.6-6        proto_0.3-10        ISOcodes_2015.04.04 jsonlite_0.9.16     bit64_0.9-4         bit_1.1-12         
[10] xtable_1.7-4        RODBC_1.3-11        lubridate_1.3.3     magrittr_1.5        data.table_1.9.4    stringr_1.0.0       dplyr_0.4.2         DiagrammeR_0.7      knitr_1.10.5       

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6      rstudioapi_0.3.1 R6_2.0.1         highr_0.5        plyr_1.8.3       tools_3.2.1      parallel_3.2.1   htmltools_0.2.6  lazyeval_0.1.10  yaml_2.1.13      assertthat_0.1  
[12] digest_0.6.8     reshape2_1.4.1   htmlwidgets_0.5  curl_0.9         memoise_0.2.1    rmarkdown_0.7    stringi_0.5-5    chron_2.3-47    

@hadley
Copy link
Member

hadley commented Jul 3, 2015

No need to keep adding examples. We know about the problem.

@romainfrancois
Copy link
Member

I think those are all the same as #1231. So I'm closing
Please each of you try your own examples against the dev version and reopen if the problem is still there.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants