Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fread fill=true and sep= provided could still read as 1-column #2666

Closed
mrdwab opened this issue Mar 9, 2018 · 0 comments
Closed

fread fill=true and sep= provided could still read as 1-column #2666

mrdwab opened this issue Mar 9, 2018 · 0 comments
Milestone

Comments

@mrdwab
Copy link

mrdwab commented Mar 9, 2018

Here's a minimal example of some small files that I was trying to read with fread:

library(data.table)

V1 <- c("A;B;C", "D", "E;F")
V2 <- c("A;B;C", "D", "E")

fread(paste(V1, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:  E  F   

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#       V1
# 1: A;B;C
# 2:     D
# 3:     E

Notice that the second file only returns 1 column while I was expecting 3. It seems to be ignoring the sep provided and guessing based on the remaining rows. Here are some other "files", one of which also does not work, and one that does:

V3 <- c("A;B;C", ";D", "E")
V4 <- c("A;B;C", "D", ";E")

fread(paste(V3, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#       V1
# 1: A;B;C
# 2:    ;D
# 3:     E

fread(paste(V4, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:     E   

I tried specifying colClasses:

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, colClasses = list(character = 1:3))
# Error in fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE,  : 
#   Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]

And then tried setting skip = 0 (which works):

fread(paste(V2, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
#    V1 V2 V3
# 1:  A  B  C
# 2:  D      
# 3:  E      

However, I don't want to set skip = 0 because then it doesn't seem to work if a sep value is not found in the first row:

V5 <- c("D", "E", "A;B;C")

fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE, skip = 0)
#       V1
# 1:     D
# 2:     E
# 3: A;B;C

fread(paste(V5, collapse = "\n"), sep = ";", header = FALSE, fill = TRUE)
#    V1 V2 V3
# 1:  D      
# 2:  E      
# 3:  A  B  C

Two questions:

  1. Should fread be ignoring a manually specified sep value?
  2. The documentation says that skip defaults to 0, but formals(fread)$skip returns [1] "__auto__". Should the documentation be updated to explain what "__auto__" represents?

sessionInfo()
# R version 3.4.2 (2017-09-28)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 17.10
# 
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
# 
# locale:
#  [1] LC_CTYPE=en_IN.UTF-8       LC_NUMERIC=C               LC_TIME=en_IN.UTF-8       
#  [4] LC_COLLATE=en_IN.UTF-8     LC_MONETARY=en_IN.UTF-8    LC_MESSAGES=en_IN.UTF-8   
#  [7] LC_PAPER=en_IN.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
# [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_IN.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] data.table_1.10.5
# 
# loaded via a namespace (and not attached):
# [1] compiler_3.4.2 tools_3.4.2    yaml_2.1.17 
@mattdowle mattdowle added this to the v1.10.6 milestone Apr 14, 2018
@mattdowle mattdowle changed the title fread ignores explicitly set sep argument fread fill=true and sep= provided could still read as 1-column Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants