Skip to content
This repository has been archived by the owner on May 25, 2022. It is now read-only.

Add support for parsing multiline csv records #425

Merged
merged 5 commits into from
Mar 10, 2022

Conversation

djaglowski
Copy link
Member

The csv parser previously was unable to properly handle multiline values.
This change adds support. Newlines within csv records are preserved.

Resolves #312

The csv parser previously was unable to properly handle multiline values.
This change adds support. Newlines within csv records are preserved.
@codecov
Copy link

codecov bot commented Mar 9, 2022

Codecov Report

Merging #425 (4d8cb17) into main (f421469) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #425     +/-   ##
=======================================
- Coverage   75.2%   75.2%   -0.1%     
=======================================
  Files         83      83             
  Lines       4025    4033      +8     
=======================================
+ Hits        3030    3035      +5     
- Misses       693     696      +3     
  Partials     302     302             
Impacted Files Coverage Δ
operator/parser/csv/csv.go 93.1% <100.0%> (+0.8%) ⬆️
operator/transformer/recombine/recombine.go 75.6% <0.0%> (-2.1%) ⬇️

@djaglowski djaglowski marked this pull request as ready for review March 9, 2022 16:07
@djaglowski djaglowski requested review from a team and jsirianni March 9, 2022 16:07
@djaglowski
Copy link
Member Author

cc: @atoulme

"\n\na\na\n\naa,\n\nbb\nbb\n\n,cc\ncc\n\n,\ndddd\n,eeee\n\n",
map[string]interface{}{
"A": "a\na\naa",
"B": "\nbb\nbb\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this value end with two \n?

"\n\na\na\n\naa,\n\nbb\nbb\n\n,cc\ncc\n\n,\ndddd\n,eeee\n\n",
map[string]interface{}{
"A": "a\na\naa",
"B": "\nbb\nbb\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this value end with two \n?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, not in this case, but I added a case where the empty line is preserved. This is actually covered in RFC 4180 Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense

"B": "\nbb\nbb\n",
"C": "cc\ncc\n",
"D": "\ndddd\n",
"E": "eeee",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, should have two \n at the end of the value

"B": "\nbb\nbb\n",
"C": "cc\ncc\n",
"D": "\ndddd\n",
"E": "eeee",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, should have two \n at the end of the value

}{
{
"first_field",
"aa\naa,bbbb,cccc,dddd,eeee",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test with no newlines as well, and several lines? I would like to make sure existing parsing still works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a basic case to this set.

}{
{
"first_field",
"aa\naa,bbbb,cccc,dddd,eeee",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test with no newlines as well, and several lines? I would like to make sure existing parsing still works

Copy link

@atoulme atoulme Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test with escaped double quotes? Say

"foo","bar","""foobar
is the new foo-bar""","another entry"```

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some more test cases mixing quotes and returns.

@djaglowski djaglowski merged commit a523052 into open-telemetry:main Mar 10, 2022
@djaglowski djaglowski deleted the csv-multiline branch March 10, 2022 14:28
jsirianni pushed a commit to jsirianni/opentelemetry-log-collection that referenced this pull request Mar 28, 2022
* Add support for parsing multiline csv records

The csv parser previously was unable to properly handle multiline values.
This change adds support. Newlines within csv records are preserved.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CSV parser does not handle multiline entries
3 participants