Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polish the zq command doc #5170

Merged
merged 5 commits into from
Jul 8, 2024
Merged

Polish the zq command doc #5170

merged 5 commits into from
Jul 8, 2024

Conversation

philrz
Copy link
Contributor

@philrz philrz commented Jul 4, 2024

In a recent community Slack thread, a new user began learning zq by reading the docs and asked:

is there a list of output formats? similar to this table of input formats?
https://zed.brimdata.io/docs/commands/zq/#input-formats

Indeed, I saw such a table did not exist. I then started reading the whole doc through the eyes of a new user and found several other things that also use some polish, including:

  • A few more hyperlinks
  • Improved formatting
  • Coverage of some recently-added features

etc.

I'll include in-line comments in spots where I feel it would help to explain my motivation.

@philrz philrz requested a review from a team July 4, 2024 22:17
@philrz philrz self-assigned this Jul 4, 2024
Comment on lines 95 to 107
| Option | Auto | Specification |
|-----------|------|------------------------------------------|
| `arrows` | yes | [Arrow IPC Stream Format](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format) |
| `json` | yes | [JSON RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html) |
| `csv` | yes | [CSV RFC 4180](https://www.rfc-editor.org/rfc/rfc4180.html) |
| `json` | yes | [JSON RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html) |
| `line` | no | One string value per input line |
| `parquet` | yes | [Apache Parquet](https://github.com/apache/parquet-format) |
| `tsv` | yes | [TSV - Tab-Separated Values](https://en.wikipedia.org/wiki/Tab-separated_values) |
| `vng` | yes | [VNG - Binary Columnar Format](../formats/vng.md) |
| `zson` | yes | [ZSON - Human-readable Format](../formats/zson.md) |
| `zng` | yes | [ZNG - Binary Row Format](../formats/zson.md) |
| `zjson` | yes | [ZJSON - Zed over JSON](../formats/zjson.md) |
| `zeek` | yes | [Zeek Logs](https://docs.zeek.org/en/master/logs/index.html) |
| `zjson` | yes | [ZJSON - Zed over JSON](../formats/zjson.md) |
| `zng` | yes | [ZNG - Binary Row Format](../formats/zson.md) |
| `zson` | yes | [ZSON - Human-readable Format](../formats/zson.md) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and alphabetized the order in this table (and in the new table in the "output" section too). Looking at the history of this table, I know that at one time it was intentionally ordered by some sense of which formats would be most relevant to users. However, we now support many more formats and diverse use cases, and over time some of the "adds" here have often been alphabetical (e.g., arrows ending up first), so it had already become kind of "hybrid". We place enough emphasis in the doc on the biggies (e.g., ZNG/ZSON and how they relate to JSON) that it seems safe to just go with the alpha ordering we use for most things.

The input format is typically detected automatically and the formats for which
`Auto` is `yes` in the table above support _auto-detection_.
The input format is typically [detected automatically](#auto-detection) and the formats for which
"Auto" is "yes" in the table above support _auto-detection_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a style thing, but I tend to reserve the fixed width font for things a user types, formal language constructs, names of shell tools, etc.

Comment on lines 179 to 192
| Option | Specification |
|-----------|------------------------------------------|
| `arrows` | [Arrow IPC Stream Format](https://arrow.apache.org/docs/format/Columnar.html#ipc-streaming-format) |
| `csv` | [CSV RFC 4180](https://www.rfc-editor.org/rfc/rfc4180.html) |
| `json` | [JSON RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html) |
| `parquet` | [Apache Parquet](https://github.com/apache/parquet-format) |
| `table` | (described [below](#simplified-text-outputs)) |
| `text` | (described [below](#simplified-text-outputs)) |
| `tsv` | [TSV - Tab-Separated Values](https://en.wikipedia.org/wiki/Tab-separated_values) |
| `vng` | [VNG - Binary Columnar Format](../formats/vng.md) |
| `zeek` | [Zeek Logs](https://docs.zeek.org/en/master/logs/index.html) |
| `zjson` | [ZJSON - Zed over JSON](../formats/zjson.md) |
| `zng` | [ZNG - Binary Row Format](../formats/zson.md) |
| `zson` | [ZSON - Human-readable Format](../formats/zson.md) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine some folks may not love the repetition between this and the "inputs" table. There's a couple reasons I bit the bullet and added this.

  1. The community user that kicked off this effort expressed a sense that they expected to see the table. The text that was there before said "The supported output formats include all of the input formats along with text and table formats", so I'd have hoped that was enough coverage. But I know that users can often be in a rush when consulting docs.

  2. This is sort of a variation on the previous point, but given the section header hyperlinks for "Input Formats" and "Output Formats" in the right-hand side of the screen when viewing the rendered docs site, it is indeed a handy convenience to see the quick splash table with one click rather than having to do a "delta" exercise after clicking to another section.

  3. Strictly speaking, we can't claim that the supported output formats include all of the input formats, since line is not an output format (incidentally, that's been requested by at least one user via Export data in line #5042).

@@ -345,6 +363,72 @@ produces the original data
While the `-split` option is most useful for schema-rigid formats, it can
be used with any output format.

### Simplified Text Outputs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole section is brand new and I kind of cooked it up out of thin air, so I'm definitely open to collaborative edits.

I've always been a little bothered that text and table are available, users fiddle with them at times (e.g., the community user that spawned this effort found issue #5154) but they have plenty of limitations/gotchas and aren't really documented anywhere that I can see. As I tried to touch on what I think of as the important points, I can tell that there's probably no way to be exhaustive short of telling users to go look at the code. Therefore I tried to cover what I think of as the biggies, wrap it in a note of mild caution, and assume interested users will hack and be tolerant.

I respect it's just long enough that it might be seen as interrupting the flow, so if there's an interest in moving it to another spot (maybe even in "Formats", since strictly speaking they're our own creation) I'm open to that.

docs/commands/zq.md Outdated Show resolved Hide resolved
docs/commands/zq.md Outdated Show resolved Hide resolved
philrz and others added 2 commits July 8, 2024 11:15
Co-authored-by: Noah Treuhaft <noah.treuhaft@gmail.com>
Co-authored-by: Noah Treuhaft <noah.treuhaft@gmail.com>
@philrz philrz merged commit 31893e1 into main Jul 8, 2024
4 checks passed
@philrz philrz deleted the zq-command-doc-polish branch July 8, 2024 18:29
@chrismo
Copy link

chrismo commented Jul 8, 2024

Cool! Thx for this. :)

@philrz philrz linked an issue Aug 19, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Revisit text mode output
3 participants