Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV writer #1237

Closed
philrz opened this issue Sep 10, 2020 · 4 comments · Fixed by #1267 or #1300
Closed

CSV writer #1237

philrz opened this issue Sep 10, 2020 · 4 comments · Fixed by #1267 or #1300
Assignees

Comments

@philrz
Copy link
Contributor

philrz commented Sep 10, 2020

A community user asked:

With some non-Zeek NDJSON data sources we turn into ZNG and store, we also may want to write to a SQL db (JSON works but prefer CSV) to populate visualizations of the data in say like an Apache Superset (which I actually have plans to integrate for Zeek data, probably back out in Zeek JSON from ZNG).

@mccanne mccanne self-assigned this Sep 13, 2020
@alfred-landrum
Copy link
Contributor

I think the CSV writer should handle heterogenous records, though I assume that'd require writing to a temporary file to build the union of all the record types. That may be more than what the community user asked for, if their target was a sql db import.

@mccanne
Copy link
Collaborator

mccanne commented Sep 14, 2020

@alfred-landrum, agreed. I will write up a new issue for this. There are a couple ways to do it.

@mccanne
Copy link
Collaborator

mccanne commented Sep 14, 2020

We also need to change the search endpoint with csv output (as part of the larger refactoring there) so the front-end doesn't have to have its own csv writer and record unflattener in javascript.

brim-bot pushed a commit to brimdata/zui that referenced this issue Sep 14, 2020
This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/zed#1267, authored by @mccanne,
has been merged.

add csv writer

Fixes brimdata/zed#1237
@philrz philrz linked a pull request Sep 17, 2020 that will close this issue
@philrz
Copy link
Contributor Author

philrz commented Sep 17, 2020

Verified in zq commit 4bce00d.

Output as CSV currently works for any data that can be represented by a single "descriptor" (i.e. they all have the same record type). For instance, the contents of a single Zeek log:

$ zq -f csv stats.log.gz 
_path,ts,peer,mem,pkts_proc,bytes_recv,pkts_dropped,pkts_link,pkt_lag,events_proc,events_queued,active_tcp_conns,active_udp_conns,active_icmp_conns,tcp_conns,udp_conns,icmp_conns,timers,active_timers,files,active_files,dns_requests,active_dns_requests,reassem_tcp_size,reassem_file_size,reassem_frag_size,reassem_unknown_size
stats,2018-03-24T17:15:20.600725Z,zeek,74,26,29375,-,-,-,404,11,1,0,0,1,0,0,36,32,0,0,0,0,1528,0,0,0
stats,2018-03-24T17:20:20.60102Z,zeek,281,6435390,3052165142,-,-,-,2575313,2575317,5818,187,1070,338620,5126,9490,1301330,33086,48528,27,61,0,77448,53168,0,0
stats,2018-03-24T17:25:20.601054Z,zeek,282,4556854,1920350471,-,-,-,1958516,1958507,4401,110,1010,211123,4600,2354,1006374,27351,41930,10,5,0,90752,0,1672,0
stats,2018-03-24T17:30:20.601101Z,zeek,281,8112284,3883978461,-,-,-,1586683,1586685,2795,137,648,207754,4666,31554,928834,19429,29547,26,3,0,126712,395584,1504,0
stats,2018-03-24T17:35:20.601137Z,zeek,282,5467567,3398705931,-,-,-,1535999,1535998,4239,146,305,193639,4731,2510,879701,25895,35230,88,6,0,455128,0,0,0

It's then trivial to put this output into my paste buffer and, say, enter it into a Google Sheet and select Data > Split Text To Columns.

image

If the data trying to be output requires multiple descriptors, the user will receive an error message and the output will stop when the first record of a different type is encountered. For example, if you try to output two Zeek logs at once via CSV:

$ zq -f csv stats.log.gz weird.log.gz 
_path,ts,peer,mem,pkts_proc,bytes_recv,pkts_dropped,pkts_link,pkt_lag,events_proc,events_queued,active_tcp_conns,active_udp_conns,active_icmp_conns,tcp_conns,udp_conns,icmp_conns,timers,active_timers,files,active_files,dns_requests,active_dns_requests,reassem_tcp_size,reassem_file_size,reassem_frag_size,reassem_unknown_size
stats,2018-03-24T17:15:20.600725Z,zeek,74,26,29375,-,-,-,404,11,1,0,0,1,0,0,36,32,0,0,0,0,1528,0,0,0
csv output requires uniform records but different types encountered

However, we do have a plan to address this case as well, and that's tracked in #1271.

Finally, I noticed that csv is not yet listed in the -f output formats in the zq help text, so #1300 tracks getting that added.

Thanks @mccanne!

brim-bot pushed a commit to brimdata/zui that referenced this issue Sep 17, 2020
…by philrz

This is an auto-generated commit with a zq dependency update. The zq PR
brimdata/zed#1300, authored by @philrz,
has been merged.

Output format changes: Add "csv", remove "types"

While verifying brimdata/zed#1237, I noticed that CSV is not yet listed among the output formats. I wondered if maybe we were intentionally holding off on revealing it until we address brimdata/zed#1271, but it seems useful enough in its present form that I'm proposing here that we reveal it now.

I'd also recalled seeing @mccanne mention recently that `types` was removed as an output format. Indeed, as of `zq` commit `4bce00d`:

```
$ zq -version
Version: v0.21.0-27-g4bce00d
```

Therefore I'm also taking that out while I'm at it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants