-
Notifications
You must be signed in to change notification settings - Fork 43
Add support for parsing multiline csv records #425
Conversation
The csv parser previously was unable to properly handle multiline values. This change adds support. Newlines within csv records are preserved.
Codecov Report
@@ Coverage Diff @@
## main #425 +/- ##
=======================================
- Coverage 75.2% 75.2% -0.1%
=======================================
Files 83 83
Lines 4025 4033 +8
=======================================
+ Hits 3030 3035 +5
- Misses 693 696 +3
Partials 302 302
|
cc: @atoulme |
"\n\na\na\n\naa,\n\nbb\nbb\n\n,cc\ncc\n\n,\ndddd\n,eeee\n\n", | ||
map[string]interface{}{ | ||
"A": "a\na\naa", | ||
"B": "\nbb\nbb\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this value end with two \n
?
"\n\na\na\n\naa,\n\nbb\nbb\n\n,cc\ncc\n\n,\ndddd\n,eeee\n\n", | ||
map[string]interface{}{ | ||
"A": "a\na\naa", | ||
"B": "\nbb\nbb\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this value end with two \n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, not in this case, but I added a case where the empty line is preserved. This is actually covered in RFC 4180 Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense
"B": "\nbb\nbb\n", | ||
"C": "cc\ncc\n", | ||
"D": "\ndddd\n", | ||
"E": "eeee", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, should have two \n
at the end of the value
"B": "\nbb\nbb\n", | ||
"C": "cc\ncc\n", | ||
"D": "\ndddd\n", | ||
"E": "eeee", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, should have two \n
at the end of the value
}{ | ||
{ | ||
"first_field", | ||
"aa\naa,bbbb,cccc,dddd,eeee", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test with no newlines as well, and several lines? I would like to make sure existing parsing still works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a basic case to this set.
}{ | ||
{ | ||
"first_field", | ||
"aa\naa,bbbb,cccc,dddd,eeee", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test with no newlines as well, and several lines? I would like to make sure existing parsing still works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test with escaped double quotes? Say
"foo","bar","""foobar
is the new foo-bar""","another entry"```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some more test cases mixing quotes and returns.
* Add support for parsing multiline csv records The csv parser previously was unable to properly handle multiline values. This change adds support. Newlines within csv records are preserved.
The csv parser previously was unable to properly handle multiline values.
This change adds support. Newlines within csv records are preserved.
Resolves #312