Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behaviour when reading empty columns #1551

Open
lowk opened this issue Aug 19, 2024 · 0 comments
Open

Unexpected behaviour when reading empty columns #1551

lowk opened this issue Aug 19, 2024 · 0 comments

Comments

@lowk
Copy link

lowk commented Aug 19, 2024

I am having issues getting readr::read_csv to treat empty columns as strings rather than as vectors of logical NAs. Setting na = "NA" does not have the expected effect (it still reads the empty columns as vectors of logical NAs) and also produces an unexpected warning message. It seems possible that this might be a bug.

An example of what I mean:

#Make a tibble with an empty column
df_write <- tibble::tibble(a = 1:3,b = "")

#Check the class of column b
class(df_write$b)
# [1] "character"

#Write the tibble to a temporary file
filepath <- tempfile()
readr::write_csv(df_write,file = filepath)

# read it in using read_csv, treating empty strings as characters rather than missing:
df_read <- readr::read_csv(file = filepath, col_types = readr::cols(),na = "NA")

#Warning message:                                                                                                   
#One or more parsing issues, call `problems()` on your data frame for details,
#e.g.:
#  dat <- vroom(...)
#   problems(dat) 

#Check the class of column b
class(df_read$b)
#[1] "logical"

# check the problems
problems(df_read)
# A tibble: 3 × 5
#    row   col expected           actual file                                    
#  <int> <int> <chr>              <chr>  <chr>                                   
#1     2     2 1/0/T/F/TRUE/FALSE ""     /private/var/folders/65/zc1jdwvx0m5gw8t…
#2     3     2 1/0/T/F/TRUE/FALSE ""     /private/var/folders/65/zc1jdwvx0m5gw8t…
#3     4     2 1/0/T/F/TRUE/FALSE ""     /private/var/folders/65/zc1jdwvx0m5gw8t…

The reason I say that this seems like it might be a bug is that I think the expected behaviour here would be that if na = "NA", columns of empty strings should be treated as character vectors of empty strings rather than vectors of logical NAs.

Digging a bit deeper, the issue comes from parse_guess, which guesses a vector of empty strings as logical even if na = "NA":

#this gives logical, as expected:
readr::guess_parser("")
#[1] "logical"

#this also gives logical, whereas I would expect it to default to the more general "character":
readr::guess_parser("", na = "NA")
#[1] "logical"

If this isn't a bug, what is the correct way to get read_csv to read empty columns as strings, in general? Obviously, in the case above I can set cols(b = "character"), but what happens if I don't know ahead of time which columns will be full of empty strings?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant