Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

join() crash #43

Closed
low-decarie opened this issue May 24, 2011 · 3 comments
Closed

join() crash #43

low-decarie opened this issue May 24, 2011 · 3 comments

Comments

@low-decarie
Copy link

Thank you for the tremendously good work on this essential package.

My current script that causes the crash is too bulky for upload. I am working on an example script that will cause the same crash.

join() crashes my R session with:

*** caught segfault ***
address 0x0, cause 'memory not mapped'

Traceback:
1: .Call("split_indices", index, group, as.integer(n))
2: split_indices(seq_along(keys$y), keys$y, keys$n)
3: join_ids(x, y, by, all = TRUE)
4: join_all(x, y, by, type)
5: join(counts.transplant, counts.clamy, by = "Water.plot")

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace

within RStudio, this causes the whole app to crash.

Thank you and have an excellent day,

Etienne

@mndrs
Copy link

mndrs commented Jul 29, 2011

I have this same exact issue. I've had join in plyr crash R 2.13.1 and 2.12.0 (as well as RStudio).

@hadley
Copy link
Owner

hadley commented Aug 7, 2011

Here's a reproducible example from @imark:

m1<-data.frame(cl=c(1,2), file=c("hi", "low"))
m2<-data.frame(file=c("1776.txt", "About.txt"), actual=c(11.5, 4.5), stringsAsFactors=F)
join(m1, m2, "file")

@brendano
Copy link

I've been getting this too. There's something funny going on with factors vs character join columns, and the presence of NA's.

Factor vs. Character

Works:

d1 = data.frame(x=c('a','b'), y=1:2, stringsAsFactors=F)
d2 = data.frame(x=c('b','d'), z=1:2, stringsAsFactors=F)
join(d1,d2)

Works, even though the factors have different levels:

d1 = data.frame(x=c('a','b'), y=1:2)
d2 = data.frame(x=c('b','d'), z=1:2)
join(d1,d2)

Works, though gets the wrong answer:

d1 = data.frame(x=c('a','b'), y=1:2, stringsAsFactors=F)
d2 = data.frame(x=c('b','d'), z=1:2)
join(d1,d2)

Crashes:

d1 = data.frame(x=c('a','b'), y=1:2)
d2 = data.frame(x=c('b','d'), z=1:2, stringsAsFactors=F)
join(d1,d2)

Specifically, it's a segfault in split_indices.

NA's in factors

When the right join column (under a left join) is a factor and has an NA in it, it wants to crash.

Works:

d1 = data.frame(x=c('a','b'), y=1:2, stringsAsFactors=F)
d2 = data.frame(x=c('b',NA), z=1:2, stringsAsFactors=F)
join(d1,d2)

Works:

d1 = data.frame(x=c(NA,'b'), y=1:2)
d2 = data.frame(x=c('b','c'), z=1:2)
join(d1,d2)

Crashes:

d1 = data.frame(x=c('a','b'), y=1:2)
d2 = data.frame(x=c('b',NA), z=1:2)
join(d1,d2)

Again, the segfault is in split_indices.

NA's in numerics are fine

These both work. The problem seems constricted to factor vectors.

d1 = data.frame(x=c(10,11), y=1:2)
d2 = data.frame(x=c(11,12), z=1:2)
join(d1,d2)

d1 = data.frame(x=c(10,11), y=1:2)
d2 = data.frame(x=c(11,NA), z=1:2)
join(d1,d2)

d1 = data.frame(x=c(NA,11), y=1:2)
d2 = data.frame(x=c(11,12), z=1:2)
join(d1,d2)

Non-determinism

Sometimes, instead of a segfault I get a benign error message in split_indices. If I start R fresh and do a similar setup as the NA version above, just with larger data frames:

d1 = data.frame(x=letters[1:4], y=1:4)
d2 = data.frame(x=letters[2:5], z=1:4)
d2$x[2] = NA
join(d1,d2)

I only get the error

Error in split_indices(seq_along(keys$y), keys$y, keys$n) : 
  INTEGER() can only be applied to a 'integer', not a 'character'
Calls: join -> join_all -> join_ids -> split_indices -> .Call

(which, by the way, seems strange since the columns are factors, not characters.)

But if I do it a few more times, I get the segfault, same as all the above crashes:

 *** caught segfault ***
address 0x202, cause 'memory not mapped'

Traceback:
 1: .Call("split_indices", index, group, as.integer(n))
 2: split_indices(seq_along(keys$y), keys$y, keys$n)
 3: join_ids(x, y, by, all = TRUE)
 4: join_all(x, y, by, type)
 5: join(d1, d2)

@hadley hadley closed this as completed in e493ef4 Oct 30, 2011
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants