Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CSV output when columns are consistent even if types aren't #4781

Closed
philrz opened this issue Sep 25, 2023 · 1 comment · Fixed by #4889
Closed

Allow CSV output when columns are consistent even if types aren't #4781

philrz opened this issue Sep 25, 2023 · 1 comment · Fixed by #4889
Assignees

Comments

@philrz
Copy link
Contributor

philrz commented Sep 25, 2023

Repro is with Zed commit 80dbcda.

Consider this community user's CSV output example that motivated the changes in #4773

$ zq -version
Version: v1.9.0-24-g80dbcda5

$ echo '{a:1,b:null}{a:1,b:2}' | zq -f csv -
a,b
1,
CSV output requires uniform records but multiple types encountered (consider 'fuse')

Or here's another for a non-null case:

$ echo '{"a":1} {"a":"hi"}' | zq -f csv -
a
1
CSV output requires uniform records but multiple types encountered (consider 'fuse')

As the error messages indicate, adding fuse does work around the problem in both cases by taking a first pass through the data to coerce all the input values to construct a single merged record type. However, since CSV effectively lacks real data typing, what's truly important in the record type is that the fields are consistent, since once the field names in the header row are output there's no way to deal with additional fields later encountered. But using the examples above, a null value is going to be output the same in CSV (i.e., "nothing" between the comma delimiters) regardless of whether it had a particular type in Zed. Similarly, if we run fuse on the second example, indeed the union type is established to indicate the field could hold an integer or string:

$ echo '{"a":1} {"a":"hi"}' | zq -z 'fuse | count() by typeof(this)' -
{typeof:<{a:(int64,string)}>,count:2(uint64)}

but when those values are being output on the second pass, they're still printed in the same column of the CSV output as a number or string.

$ echo '{"a":1} {"a":"hi"}' | zq -f csv 'fuse' -
a
1
hi

i.e., the brief existence of the union type was just to satisfy a current constraint of the CSV writer, but didn't really enhance the output in any way.

In a discussion with @nwt he agreed that we could probably relax this constraint of the CSV writer. This would reduce the number of times that new users encounter the (consider 'fuse') message and have to take a detour to learn about why it's needed and how to use it.

mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 17, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
mattnibs added a commit that referenced this issue Nov 22, 2023
Adjust the csvio.Writer so that it can handle records with the same
field names but different types.

Closes #4781
@philrz
Copy link
Contributor Author

philrz commented Nov 22, 2023

Verified in Zed commit 79aa231.

Repeating the two examples shown above, we can now output as CSV without requiring fuse.

$ zq -version
Version: v1.11.1-7-g79aa231a

$ echo '{a:1,b:null}{a:1,b:2}' | zq -f csv -
a,b
1,
1,2

$ echo '{"a":1} {"a":"hi"}' | zq -f csv -
a
1
hi

Thanks @mattnibs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants