-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault when tabulating data incorrectly guessed as type "logical" #385
Comments
Can you provide "City Survey MASTER Data 1996-2017.xlsx"? |
Sure, it was linked on the issue. You can download it from http://sfgov.org/citysurvey/sites/default/files/City%20Survey%20MASTER%20Data%201996-2017.xlsx |
Oh sorry, I didn't notice the blue text 😬 |
I can reproduce it. It very weird, because the data frame appears normal in most respects. Thanks for this specimen. Two things I noticed in a quick examination:
library(readxl)
library(tidyverse)
df <- read_xlsx("~/Downloads/City Survey MASTER Data 1996-2017.xlsx", sheet = 2)
df %>%
count(parkvis)
#> # A tibble: 7 x 2
#> parkvis n
#> <lgl> <int>
#> 1 TRUE 2027
#> 2 TRUE 2863
#> 3 TRUE 4931
#> 4 TRUE 6400
#> 5 TRUE 9325
#> 6 TRUE 828
#> 7 NA 11598
df2 <- read_xlsx("~/Downloads/City Survey MASTER Data 1996-2017.xlsx", sheet = 2, guess_max = 40000)
table(df2$parkvis)
#>
#> 1 2 3 4 5 6
#> 2027 2863 4931 6400 9325 828 |
Yeah, I can get to the right parsing of the data a few ways, and the version that segfaults isn't useful, so this doesn't block the analysis I was trying to do. But I figured you'd want to know about a segfault :) Odd that
|
Fixes #385 Weirdly, the test passes even without this fix when run with `testthat::test()`. To get the test to fail, checkout the previous commit, install, and do ```r library(readxl) library(testthat) source("./tests/testthat/helper.R") source("./tests/testthat/test-col-types.R") ``` from within the project directory. It will segfault.
* Transform int to 0/1 before coercion to logical Fixes #385 Weirdly, the test passes even without this fix when run with `testthat::test()`. To get the test to fail, checkout the previous commit, install, and do ```r library(readxl) library(testthat) source("./tests/testthat/helper.R") source("./tests/testthat/test-col-types.R") ``` from within the project directory. It will segfault. * Add bugfix of #385 to NEWS
After reading the data in this file, I get a segmentation fault whenever I try to aggregate most columns in the resulting
tbl_df
:The problem is probably related to the fact that, because there are so many missing values at the top of the dataset, this column (and a majority of the others in the dataset) were guessed as type "logical":
It turns out that if you specify the
col_types
correctly as "numeric", the data are perfectly readable without segfaulting:The text was updated successfully, but these errors were encountered: