Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 Error #2

Closed
garak92 opened this issue Sep 17, 2020 · 2 comments
Closed

UTF-8 Error #2

garak92 opened this issue Sep 17, 2020 · 2 comments
Labels

Comments

@garak92
Copy link

garak92 commented Sep 17, 2020

Hello, I think I've found a bug and would like to report it to you.
Reported behavior: open a csv file and application crashes with utf-8 error.

Tested on this dataset: https://catalog.data.gov/dataset/traffic-data/resource/e46a5cc5-ed4d-4b8d-b750-18e6c9ec570e
Exact command:
csview monroe-county-crash-data2003-to-2015.csv

Backtrace:

thread 'main' panicked at 'called Result::unwrap()on anErrvalue: Error(Utf8 { pos: Some(Position { byte: 512345, line: 4326, record: 4326 }), err: Utf8Error { field: 9, valid_up_to: 18 } })', src/core.rs:12:37 note: run withRUST_BACKTRACE=1environment variable to display a backtrace myuser@pop-os:~/Downloads$ RUST_BACKTRACE=1 csview monroe-county-crash-data2003-to-2015.csv thread 'main' panicked at 'calledResult::unwrap()on anErrvalue: Error(Utf8 { pos: Some(Position { byte: 512345, line: 4326, record: 4326 }), err: Utf8Error { field: 9, valid_up_to: 18 } })', src/core.rs:12:37 stack backtrace: 0: backtrace::backtrace::libunwind::trace at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86 1: backtrace::backtrace::trace_unsynchronized at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66 2: std::sys_common::backtrace::_print_fmt at src/libstd/sys_common/backtrace.rs:78 3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt at src/libstd/sys_common/backtrace.rs:59 4: core::fmt::write at src/libcore/fmt/mod.rs:1076 5: std::io::Write::write_fmt at src/libstd/io/mod.rs:1537 6: std::sys_common::backtrace::_print at src/libstd/sys_common/backtrace.rs:62 7: std::sys_common::backtrace::print at src/libstd/sys_common/backtrace.rs:49 8: std::panicking::default_hook::{{closure}} at src/libstd/panicking.rs:198 9: std::panicking::default_hook at src/libstd/panicking.rs:217 10: std::panicking::rust_panic_with_hook at src/libstd/panicking.rs:526 11: rust_begin_unwind at src/libstd/panicking.rs:437 12: core::panicking::panic_fmt at src/libcore/panicking.rs:85 13: core::option::expect_none_failed at src/libcore/option.rs:1269 14: <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::next 15: csview::main 16: std::rt::lang_start::{{closure}} 17: main 18: __libc_start_main 19: _start note: Some details are omitted, run withRUST_BACKTRACE=fullfor a verbose backtrace.

Version of csview: 0.3.2-rc.0
OS: Pop!_OS 20.04 with Xanmod Kernel

Thank you very much in advance.

@wfxr
Copy link
Owner

wfxr commented Sep 17, 2020

@garak92 Thanks for your feedback.

It may be caused by file encoding. You can check it using file command: file -i xxx.csv

In order to reduce the complexity of processing CJK characters and emoji, csview only supports UTF-8 encoding. It should be easy to convert from other encoding into UTF-8 using some tools like iconv or vim.

But the error message is not friendly however, I will improve it later.

EDIT:
I downloaded the file from the link and confirmed the file encoding is non-utf8:

$ file -i a.csv
a.csv: application/csv; charset=iso-8859-1

csview works fine after converting to utf8:

$ iconv -f iso-8859-1 -t UTF8//TRANSLIT a.csv | head -10 | csview
+----------------------+------+-------+-----+----------+------+----------------+--------------------+------------------------------------------+-------------------------+-------------+--------------+
| Master Record Number | Year | Month | Day | Weekend? | Hour | Collision Type | Injury Type        | Primary Factor                           | Reported_Location       | Latitude    | Longitude    |
+----------------------+------+-------+-----+----------+------+----------------+--------------------+------------------------------------------+-------------------------+-------------+--------------+
| 902363382            | 2015 | 1     | 5   | Weekday  | 0    | 2-Car          | No injury/unknown  | OTHER (DRIVER) - EXPLAIN IN NARRATIVE    | 1ST & FESS              | 39.15920668 | -86.52587356 |
| 902364268            | 2015 | 1     | 6   | Weekday  | 1500 | 2-Car          | No injury/unknown  | FOLLOWING TOO CLOSELY                    | 2ND & COLLEGE           | 39.16144    | -86.534848   |
| 902364412            | 2015 | 1     | 6   | Weekend  | 2300 | 2-Car          | Non-incapacitating | DISREGARD SIGNAL/REG SIGN                | BASSWOOD & BLOOMFIELD   | 39.14978027 | -86.56889006 |
| 902364551            | 2015 | 1     | 7   | Weekend  | 900  | 2-Car          | Non-incapacitating | FAILURE TO YIELD RIGHT OF WAY            | GATES & JACOBS          | 39.165655   | -86.57595635 |
| 902364615            | 2015 | 1     | 7   | Weekend  | 1100 | 2-Car          | No injury/unknown  | FAILURE TO YIELD RIGHT OF WAY            | W 3RD                   | 39.164848   | -86.57962482 |
| 902364664            | 2015 | 1     | 6   | Weekday  | 1800 | 2-Car          | No injury/unknown  | FAILURE TO YIELD RIGHT OF WAY            | BURKS & WALNUT          | 39.12666969 | -86.53136998 |
| 902364682            | 2015 | 1     | 6   | Weekday  | 1200 | 2-Car          | No injury/unknown  | DRIVER DISTRACTED - EXPLAIN IN NARRATIVE | SOUTH CURRY PIKE LOT 71 | 39.150825   | -86.584899   |
| 902364683            | 2015 | 1     | 6   | Weekday  | 1400 | 1-Car          | Incapacitating     | ENGINE FAILURE OR DEFECTIVE              | NORTH LOUDEN RD         | 39.19927216 | -86.63702393 |
| 902364714            | 2015 | 1     | 7   | Weekend  | 1400 | 2-Car          | No injury/unknown  | FOLLOWING TOO CLOSELY                    | LIBERTY & W 3RD         | 39.16461021 | -86.57913007 |
+----------------------+------+-------+-----+----------+------+----------------+--------------------+------------------------------------------+-------------------------+-------------+--------------+

@alexanderkjall
Copy link

I wrote a fuzzer for this and discovered the same error, the problem seems to be the unwrap() method. Wrote a small fix in order to be able to continue with the fuzzing.

wfxr added a commit that referenced this issue Sep 18, 2020
@wfxr wfxr added the stale label Sep 28, 2020
@stale stale bot closed this as completed Oct 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants