Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 BOM encoded as part of the first field #138

Closed
dsimunic opened this issue Apr 25, 2017 · 6 comments
Closed

UTF8 BOM encoded as part of the first field #138

dsimunic opened this issue Apr 25, 2017 · 6 comments

Comments

@dsimunic
Copy link

dsimunic commented Apr 25, 2017

When a csv file with header row contains the UTF BOM (0xEFBBBF), the BOM characters become the part of the first field name.

For example:


$ hexdump -C file-with-utf-bom.csv
00000000  ef bb bf 74 69 74 6c 65  2c 66 69 72 73 74 2c 6c  |...title,first,l|
00000010  61 73 74 2c 65 6d 61 69  6c 2c 70 6f 73 69 74 69  |ast,email,positi|
00000020  6f 6e 2c 6f 72 67 61 6e  69 7a 61 74 69 6f 6e     |on,organization |

$ mlr --icsv --ojson file-with-utf-bom.csv | hexdump -C | head
00000000  7b 20 22 ef bb bf 74 69  74 6c 65 22 3a 20 22 22  |{ "...title": ""|
00000010  2c 20 22 66 69 72 73 74  22 3a 20 22 54 61 6b 65  |, "first": "Take|

$ mlr --version
Miller 5.1.0

OS is macOS Sierra 10.12.4

@johnkerl
Copy link
Owner

Easy to strip off on input. Do you want to have the option to produce the BOM on output as well?

@dsimunic
Copy link
Author

dsimunic commented Apr 26, 2017 via email

@johnkerl
Copy link
Owner

ok

@johnkerl
Copy link
Owner

i mean for csv output

@dsimunic
Copy link
Author

dsimunic commented Apr 26, 2017 via email

@johnkerl
Copy link
Owner

johnkerl commented May 1, 2017

This is 2baf206 which is in head, and will go out in the next release.

@dsimunic please re-open if I missed anything -- thank you for the request!!

@johnkerl johnkerl closed this as completed May 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants