You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 25, 2022. It is now read-only.
Here is below a workaround I have applied for the time. Ideally, I'd like the csv_parser to be able to perform this without the need to remove the header line, and implicitly handle the multiline element.
filelog:
include: [ /output/*.csv ]
start_at: beginning
multiline:
line_start_pattern: "^\"[^\"]"
operators:
# remove the header line
- id: remove_header
type: filter
expr: '$$body matches "^AuthorID,Author,Date,Content,Attachments,Reactions$"'
output: csv
# parse each line as a record
- id: csv
type: csv_parser
header: AuthorID,Author,Date,Content,Attachments,Reactions
timestamp:
parse_from: Date
layout_type: epoch
layout: s
preserve: true
Ideally, I would like the csv_parser to accept a multiline marker for the beginning of an entry.
If you look at the one I use right now, "^\"[^\"]", I indicate that I consider a new entry as a line starting with a double quote and not immediately followed by a double quote - csv escapes double quotes by doubling them, so this entry would be correctly parsed:
Header1,Header2
"foo","
""bar"""
"bob","alice"
This is not perfect. If a line starts with a double quote, then it won't be picked up:
Header1,Header2
"""foo""","bar"
However, for my case the first item in the line is always a number, so I escape this condition.
The CSV parser is unable to handle CSV entries that span multiple lines, when a field of the CSV contains newline characters.
The text was updated successfully, but these errors were encountered: