Fread should exclude UTF-8 BOM from column names #1087

nigmastar · 2015-03-19T15:20:18Z

Currently, fread seems to include the utf-8 BOM inside the file.

library(data.table)

# Create file

f <- "test_utf-8.csv"
cat(intToUtf8(c(239, 187, 191)), 'a,b,c\n1,2,3\n', file = f, sep = '')

# Import file

dt <- fread(f)
names(dt)
## [1] "ï»¿a" "b"    "c"

The text was updated successfully, but these errors were encountered:

jangorecki · 2015-04-04T01:50:52Z

is there an easy workaround for that?

httassadar · 2015-11-12T13:37:20Z

Not sure if the issue title makes sense, why file names?

But the post does describe the issue I really hope it's fixed.

arunsrinivasan · 2016-03-07T01:02:39Z

The file you provide is incorrectly encoded.

> readBin("test_utf-8.csv", raw(), file.info("test_utf-8.csv")$size)
#  [1] c3 af c2 bb c2 bf 61 2c 62 2c 63 0a 31 2c 32 2c 33 0a

whereas it should be:

#  [1] ef bb bf 61 2c 62 2c 63 0a 31 2c 32 2c 33 0a

And performing fread("test_utf-8.csv") seems to work fine, although the BOM is still included:

ans = fread("test_utf-8.csv")
#    a b c
# 1: 1 2 3
charToRaw(names(ans)[1L])
# [1] ef bb bf 61

arunsrinivasan · 2016-03-07T01:13:29Z

With this fix, now I get:

charToRaw(names(ans)[1L])
# [1] 61

arunsrinivasan added the fread label Sep 4, 2015

nigmastar changed the title ~~Fread should exclude UTF-8 BOM from file names~~ Fread should exclude UTF-8 BOM from column names Nov 12, 2015

arunsrinivasan closed this as completed in 9fa61a9 Mar 7, 2016

arunsrinivasan added this to the v1.9.8 milestone Mar 7, 2016

arunsrinivasan self-assigned this Mar 7, 2016

mattdowle mentioned this issue Mar 28, 2017

Improvements to BOM detection: #2084

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fread should exclude UTF-8 BOM from column names #1087

Fread should exclude UTF-8 BOM from column names #1087

nigmastar commented Mar 19, 2015

jangorecki commented Apr 4, 2015

httassadar commented Nov 12, 2015

arunsrinivasan commented Mar 7, 2016

arunsrinivasan commented Mar 7, 2016

Fread should exclude UTF-8 BOM from column names #1087

Fread should exclude UTF-8 BOM from column names #1087

Comments

nigmastar commented Mar 19, 2015

jangorecki commented Apr 4, 2015

httassadar commented Nov 12, 2015

arunsrinivasan commented Mar 7, 2016

arunsrinivasan commented Mar 7, 2016