"CREATE TABLE" does not append newline to header row in new CSV file. #22

derekmahar · 2019-12-14T01:33:04Z

In the CSV file that it creates, csvq SQL statement CREATE TABLE does not append a newline character to the header that it generates in the first row:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ cat /tmp/test
a,b,c$ rm /tmp/test

Note that the $ prompt immediately follows the last column name in the header instead of appearing on the next line because CREATE TABLE does not append a newline character to the header row.

As a consequence of this issue, should a program other than csvq append a data row to the new table without first appending a newline to the header (or prepending a newline to the data row), the data row will immediately follow the header on the same row:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c1,2,3
$ rm /tmp/test

A simple workaround to this issue is to append a newline to the new CSV file after CREATE TABLE creates it, but before appending any data rows:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ echo >> /tmp/test
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
$ rm /tmp/test

Another solution would be to prepend each new data row with a newline, but omit the trailing newline:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ printf "\n1,2,3" >> /tmp/test
$ printf "\n4,5,6" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
4,5,6$ rm /tmp/test

Consistent with CREATE TABLE, SQL statement INSERT INTO also follows this convention of prepending each data row with newline:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ csvq --repository /tmp "INSERT INTO test VALUES (4,5,6)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ cat /tmp/test
a,b,c
1,2,3
4,5,6$ rm /tmp/test

To avoid this issue altogether, instead of using CREATE TABLE to create the CSV file and write the header row, we could "manually" create the file and write the header row with a trailing newline:

$ printf "a,b,c\n" > /tmp/test
$ printf "1,2,3\n" >> /tmp/test
$ cat /tmp/test
a,b,c
1,2,3
$ rm /tmp/test

There are several disadvantages to this manual approach:

Redirection overwrites an existing CSV file
Redirection cannot be applied to another SQL database like SQLite
Does not benefit from transaction management
Requires a code change to change the raw CSV header structure

The newline convention that csvq SQL statements CREATE TABLE and INSERT INTO follows, while internally consistent, is not consistent with external shell commands.

The text was updated successfully, but these errors were encountered:

mithrandie · 2019-12-14T06:50:26Z

First, the current behavior of csvq is as follows.

SELECT query appends a line break at the end of the output except for some exceptions.
Update queries, including CREATE TABLE query, don’t append line breaks.

In the csvq, SELECT query is positioned as a statement that outputs the result and passes it to the outside. On the other hand, other queries do not do such a special processing because these queries complete operations within a transaction of csvq and do not link directly with the outside.

If we strictly follow the posix standard, update queries should also append line breaks at the end of the files. However, line breaks are now often used as line separators rather than line endings, and I can't decide which behavior is better. At least, I think that adding a line break at the end of a file if needed is easier than removing a line break at the end of a file when not need.

By the way, I don’t understand exactly what you want to do. Transaction management of csvq affects only operations with in the csvq processing, and the external commands should not be relevant.

derekmahar · 2019-12-14T07:20:01Z

If we strictly follow the posix standard, update queries should also append line breaks at the end of the files.

Does the POSIX standard specify this as a convention? RFC 4180 states that a line break (CRLF) is optional on the last line of a CSV file, but doesn't specify a line break is optional for the header row. csvq follows the former convention, but it's unclear whether csvq violates the latter.

However, line breaks are now often used as line separators rather than line endings, and I can't decide which behavior is better.

What is the difference between line separator and line ending in this context? What newline convention do other CVS processing tools follow? Is csvq consistent with these tools?

At least, I think that adding a line break at the end of a file if needed is easier than removing a line break at the end of a file when not need.

Yes, I agree.

By the way, I don’t understand exactly what you want to do. Transaction management of csvq affects only operations with in the csvq processing, and the external commands should not be relevant.

I guess my point about transaction management is not relevant here because my examples mix csvq and external shell operations which are not subject to csvq transactions. However, I think the other disadvantages that I list are relevant.

derekmahar · 2019-12-14T08:06:03Z

What newline convention do other CVS processing tools follow? Is csvq consistent with these tools?

Miller doesn't complain about empty CSV files that csvq creates:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ mlr --csv cat /tmp/test
$ rm /tmp/test

Miller can also process non-empty CSV files that csvq creates, though unlike csvq, Miller appends a trailing line break to the last data row:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ mlr --csv cat /tmp/test
a,b,c
1,2,3
$ rm /tmp/test

Unlike csvq SQL command CREATE TABLE, it seems that Miller has no mechanism for creating a new CSV file.

mithrandie · 2019-12-14T08:09:07Z

The POSIX standard defines about a line as follows.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

3.206 Line

A sequence of zero or more non- characters plus a terminating character.

The RFC 4180 defines about csv format as follows.

https://tools.ietf.org/html/rfc4180

The last record in the file may or may not have an ending line break.

It’s header Line is as normal lines.

There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.

Csv is used by various tools including GUI as well as shell commands. It is not uncommon for users to write related processes themselves.
It can’t be fitted all tools, and I think the important thing is internal consistency.

derekmahar · 2019-12-14T08:28:53Z

The POSIX standard defines about a line as follows.

https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

3.206 Line
A sequence of zero or more non- characters plus a terminating character.

Okay, then it seems as you pointed out earlier, csvq does not follow this convention.

It’s header Line is as normal lines.

There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.

I must have missed this point in RFC 4180. csvq also treats the header line as a "normal" line so it apparently does follow the line break conventions of RFC 4180.

Csv is used by various tools including GUI as well as shell commands. It is not uncommon for users to write related processes themselves.

Users like me, for example. :)

It can’t be fitted all tools, and I think the important thing is internal consistency.

Yes, I agree, but I think a CSV tool should at least try to follow the conventions that most tools follow. While csvq (and RFC 4180) violate the POSIX line definition, csvq fortunately does follow the line break conventions of RFC 4180 which is arguably more important.

derekmahar · 2019-12-14T08:42:23Z

csvkit also doesn't complain about CSV files that csvq creates:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
$ cat /tmp/test | csvclean
No errors.
$ cat /tmp/test | csvlook
| a | b | c |
| - | - | - |
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ cat /tmp/test | csvclean
No errors.
$ cat /tmp/test | csvlook
|    a | b | c |
| ---- | - | - |
| True | 2 | 3 |
$ rm /tmp/test

By default, csvkit command csvformat appends a line break to header and data rows in a CSV file:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ cat /tmp/test | csvformat
a,b,c
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ cat /tmp/test | csvformat
a,b,c
1,2,3
$ rm /tmp/test

Like Miller, csvkit doesn't have a command to create a CSV file with a given a header.

derekmahar · 2019-12-14T09:06:47Z

csvtool also doesn't complain about CSV files that csvq generates, though like Miller and csvkit, it appends a line break to header and data rows:

$ csvq --repository /tmp "CREATE TABLE test(a, b, c)"
file "/tmp/test" is created.
Commit: file "/tmp/test" is created.
$ csvtool cat /tmp/test
a,b,c
$ csvq --repository /tmp "INSERT INTO test VALUES (1,2,3)"
1 record inserted on "/tmp/test".
Commit: file "/tmp/test" is updated.
$ csvtool cat /tmp/test
$ rm /tmp/test

derekmahar · 2019-12-14T09:15:48Z

While csvq follows a different line break convention than other popular CSV processing tools, these tools can process files that csvq creates.
While csvq doesn't follow the line definition in the POSIX standard, it does follow the CSV structure that RFC 4180 specifies.
I will adapt my shell scripts to the line break convention that csvq follows.

mithrandie · 2019-12-16T00:18:01Z

I changed my mind. The following changes will be implemented in the next release.

Update queries append a line break at the end of a file by default.
Add a command option to specify whether to add a line break or not. It is not yet determined whether this option affects the output to standard output.

derekmahar · 2019-12-16T00:58:44Z

Thank you. This idea is a sensible compromise.

#22]) With this change, the command option "--strip-ending-line-break" has been added.

- Append a newline at the end of created and updated files. ([Github #22](#22)) - Add the command option "--strip-ending-line-break".

derekmahar · 2020-01-13T17:32:15Z

Thank you for this fix!

derekmahar closed this as completed Dec 14, 2019

mithrandie reopened this Dec 16, 2019

mithrandie added a commit that referenced this issue Jan 12, 2020

Append a newline at the end of created and updated files. ((Github #22)[

b74a54c

#22]) With this change, the command option "--strip-ending-line-break" has been added.

mithrandie added a commit that referenced this issue Jan 12, 2020

Release v1.12.0

af2a0a8

- Append a newline at the end of created and updated files. ([Github #22](#22)) - Add the command option "--strip-ending-line-break".

mithrandie closed this as completed Jan 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"CREATE TABLE" does not append newline to header row in new CSV file. #22

"CREATE TABLE" does not append newline to header row in new CSV file. #22

derekmahar commented Dec 14, 2019 •

edited

Loading

mithrandie commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019

mithrandie commented Dec 14, 2019

derekmahar commented Dec 14, 2019

derekmahar commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019

derekmahar commented Dec 14, 2019 •

edited

Loading

mithrandie commented Dec 16, 2019

derekmahar commented Dec 16, 2019 •

edited

Loading

derekmahar commented Jan 13, 2020

"CREATE TABLE" does not append newline to header row in new CSV file. #22

"CREATE TABLE" does not append newline to header row in new CSV file. #22

Comments

derekmahar commented Dec 14, 2019 • edited Loading

mithrandie commented Dec 14, 2019 • edited Loading

derekmahar commented Dec 14, 2019 • edited Loading

derekmahar commented Dec 14, 2019

mithrandie commented Dec 14, 2019

derekmahar commented Dec 14, 2019

derekmahar commented Dec 14, 2019 • edited Loading

derekmahar commented Dec 14, 2019

derekmahar commented Dec 14, 2019 • edited Loading

mithrandie commented Dec 16, 2019

derekmahar commented Dec 16, 2019 • edited Loading

derekmahar commented Jan 13, 2020

derekmahar commented Dec 14, 2019 •

edited

Loading

mithrandie commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 14, 2019 •

edited

Loading

derekmahar commented Dec 16, 2019 •

edited

Loading