Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jq-1.6 - raw output doesn't work when -a is specified #1788

Open
gburiola opened this issue Jan 11, 2019 · 11 comments · May be fixed by #1993
Open

jq-1.6 - raw output doesn't work when -a is specified #1788

gburiola opened this issue Jan 11, 2019 · 11 comments · May be fixed by #1993
Assignees
Labels

Comments

@gburiola
Copy link

gburiola commented Jan 11, 2019

Describe the bug
On jq 1.6, using the -a (ascii-output) and -r (raw) options conflict.
This problem doesn't happen on jq 1.5

To Reproduce

1.5

$ jq --version
jq-1.5

$ echo '{"key1":"value1"}' | jq -r -a .key1
value1

1.6

$ jq --version
jq-1.6

$ echo '{"key1":"value1"}' | jq -r -a .key1
"value1"

Notice how output still contain quotes despite the -r flag.

Removing the -a flag produces the correct behaviour

$ echo '{"key1":"value1"}' | jq -r .key1
value1

Environment (please complete the following information):

This problem is reproducible with both the jq-linux64 and jq-osx-amd64 builds.

The linux build even shows a Segmentation fault error:

$ echo '{"key1":"value1"}' | ./jq.1-6 -r -a .key1
"value1"
Segmentation fault

$ echo '{"key1":"value1"}' | ./jq.1-6 -r .key1
value1
@gburiola
Copy link
Author

gburiola commented Jan 11, 2019

I think this problem was introduced on #1587 (bf88c73) that was trying to fix exactly the opposite problem (Can't use both the --raw-output and --ascii-output switches)

FYI @rain-1 / @nicowilliams

@gburiola gburiola changed the title jq-1.6 - Segmentation fault when combining options -a and -r jq-1.6 - raw output doesn't work when -a is specified Jan 11, 2019
@gburiola
Copy link
Author

Using the same examples from #1587

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq-1.5 --raw-output '.key1'
😀

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq-1.5 --raw-output -a '.key1'
😀

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq-1.6 --raw-output '.key1'
😀

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq-1.6 --raw-output -a '.key1'
"\ud83d\ude00"

@rain-1
Copy link
Contributor

rain-1 commented Jan 11, 2019

Sorry if I broke something! I could not reproduce the segfault you mentioned.

I'm not 100% sure what the fix for this would be. Let's discuss this. Currently we have

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output -a '.key1'
"\ud83d\ude00"

I understand that we want

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output -a '.key1'
\ud83d\ude00

But what about if a string is inside something else? Currently we have:

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output --ascii-output '.' 
{
  "key1": "\ud83d\ude00"
}

Do we want

$ echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output --ascii-output '.' 
{
  key1: \ud83d\ude00
}

or are we making a special case when the output is just a single string?

@gburiola
Copy link
Author

gburiola commented Jan 13, 2019

Hey @rain-1 . No problem at all. Thanks for having a look at this.

In our particular use case we use jq to process a json file that contains some binary values.

For example something similar to this:

$ random="$(dd if=/dev/urandom bs=10 count=1)"

$ echo {\"key1\":\"$random\"} | jq  .
{
  "key1": "r���P M ��"
}

Using only -a works as expected:

$ echo {\"key1\":\"$random\"} | jq -a .
{
  "key1": "r\ufffd\ufffd\ufffdP M \ufffd\ufffd"
}

All 3 examples below (no flags, only -a, only -r) all work as expected as well

$ echo {\"key1\":\"$random\"} | jq  .key1
"r���P M ��"

$ echo {\"key1\":\"$random\"} | jq -r .key1
r���P M ��

$ echo {\"key1\":\"$random\"} | jq -a .key1
"r\ufffd\ufffd\ufffdP M \ufffd\ufffd"

The problem comes when you try to combine -a and -r

$ echo {\"key1\":\"$random\"} | jq -a -r .key1
"r\ufffd\ufffd\ufffdP M \ufffd\ufffd"

I would have expected the value from the example above without the quotes

I normally use OSX. The problem above happens on both OSX and Linux but only Linux shows the SegFault error.
To reproduce the Segmentation fault error above I downloaded the jq-linux64 binary from the releases page (https://github.com/stedolan/jq/releases/download/jq-1.6/jq-linux64) and run the command below on an Ubuntu 16 machine

$ echo '{"key1":"value1"}' | ./jq.1-6 -r -a .key1
"value1"
Segmentation fault

@gburiola
Copy link
Author

answering your second question from above, my view is that --raw-output only makes sense when your output is a leaf value.

So in the first example below -r does nothing but in the second example it displays value1 instead of "value1".

$ echo '{"key1":"value1"}' | jq -r .
{
  "key1": "value1"
}

$ echo '{"key1":"value1"}' | jq -r .key1
value1

@rain-1
Copy link
Contributor

rain-1 commented Jan 14, 2019

OK. I agree.

@rain-1
Copy link
Contributor

rain-1 commented Jan 14, 2019

This patch makes a lone string printed without quotes in raw mode. Complex expression are still printed with quotes (non-raw) even in raw mode. The tests from the previous thread still work too.

From 5b3cecbc589278a6a1ca91d866cba987f7364c97 Mon Sep 17 00:00:00 2001
From: rain <rain1@airmail.cc>
Date: Mon, 14 Jan 2019 12:41:16 +0000
Subject: [PATCH] Provide a jvp_dump_raw_string function to implement the
 combination of flags -r and -a (raw and ascii)

---
 src/jq.h       | 2 ++
 src/jv_print.c | 8 ++++++--
 src/main.c     | 2 +-
 3 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/jq.h b/src/jq.h
index 5269de3..273d529 100644
--- a/src/jq.h
+++ b/src/jq.h
@@ -68,4 +68,6 @@ jv jq_util_input_get_current_line(jq_state*);
 
 int jq_set_colors(const char *);
 
+void jvp_dump_string_raw(jv str, int ascii_only, FILE* F, jv* S, int T);
+
 #endif /* !JQ_H */
diff --git a/src/jv_print.c b/src/jv_print.c
index 5ebc01e..2d44f86 100644
--- a/src/jv_print.c
+++ b/src/jv_print.c
@@ -114,13 +114,18 @@ static void put_indent(int n, int flags, FILE* fout, jv* strout, int T) {
 }
 
 static void jvp_dump_string(jv str, int ascii_only, FILE* F, jv* S, int T) {
+  put_char('"', F, S, T);
+  jvp_dump_string_raw(str, ascii_only, F, S, T);
+  put_char('"', F, S, T);
+}
+
+void jvp_dump_string_raw(jv str, int ascii_only, FILE* F, jv* S, int T) {
   assert(jv_get_kind(str) == JV_KIND_STRING);
   const char* i = jv_string_value(str);
   const char* end = i + jv_string_length_bytes(jv_copy(str));
   const char* cstart;
   int c = 0;
   char buf[32];
-  put_char('"', F, S, T);
   while ((i = jvp_utf8_next((cstart = i), end, &c))) {
     assert(c != -1);
     int unicode_escape = 0;
@@ -177,7 +182,6 @@ static void jvp_dump_string(jv str, int ascii_only, FILE* F, jv* S, int T) {
     }
   }
   assert(c != -1);
-  put_char('"', F, S, T);
 }
 
 static void put_refcnt(struct dtoa_context* C, int refcnt, FILE *F, jv* S, int T){
diff --git a/src/main.c b/src/main.c
index ebfddf9..96263d5 100644
--- a/src/main.c
+++ b/src/main.c
@@ -179,7 +179,7 @@ static int process(jq_state *jq, jv value, int flags, int dumpopts) {
   while (jv_is_valid(result = jq_next(jq))) {
     if ((options & RAW_OUTPUT) && jv_get_kind(result) == JV_KIND_STRING) {
       if (options & ASCII_OUTPUT) {
-        jv_dumpf(jv_copy(result), stdout, JV_PRINT_ASCII);
+        jvp_dump_string_raw(jv_copy(result), options & ASCII_OUTPUT, stdout, NULL, flags & JV_PRINT_ISATTY);
       } else {
         fwrite(jv_string_value(result), 1, jv_string_length_bytes(jv_copy(result)), stdout);
       }
-- 
2.18.1

tested with:

#!/usr/bin/env bash
echo '{"key1":"value1"}' | ./jq -r -a .key1
echo should not have quotes
echo
echo should have quotes
echo '{"key1":"value1"}' | ./jq -r .
echo '{"key1":"value1"}' | ./jq -r -a .
echo
echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output '.key1'
echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --ascii-output '.key1'
echo "{\"key1\":\"\\ud83d\\ude00\"}" | ./jq --raw-output --ascii-output '.key1'

And please consider making a second issue about the segfault - maybe try to build from source first though, I could not reproduce the crash.

@rain-1
Copy link
Contributor

rain-1 commented Jan 14, 2019

#1789

@rswheeldon
Copy link

I'm still running into the same segfault issue with the latest 1.6 downloaded from the website this morning and with the same built from source. Tried on two boxes with similar results:

Using the latest download:

richard@sophia:~/cc/tagger$ ~/jq-linux64 -ar '.foo' <<< '{ "foo": "bar" }'
"bar"
Segmentation fault
richard@sophia:~/cc/tagger$ ~/jq-linux64 -r '.foo' <<< '{ "foo": "bar" }'
bar
richard@sophia:~/cc/tagger$ echo $LANG
en_US.UTF-8
richard@sophia:~/cc/tagger$ ~/jq-linux64 -ar '.foo' <<< '{ "foo": "bar" }'
"bar"
Segmentation fault
richard@sophia:~/cc/tagger$ 

Using a built-from-source version:

richard@sophia:~/cc/tagger$ /usr/local/bin/jq -ar '.foo' <<< '{ "foo": "bar" }'
"bar"
Segmentation fault
richard@sophia:~/cc/tagger$ /usr/local/bin/jq -r '.foo' <<< '{ "foo": "bar" }'
bar
richard@sophia:~/cc/tagger$ /usr/local/bin/jq -a '.foo' <<< '{ "foo": "bar" }'
"bar"
richard@sophia:~/cc/tagger$ uname -a
Linux sophia 4.4.14 #2 SMP Fri Jun 24 13:38:27 CDT 2016 x86_64 Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz GenuineIntel GNU/Linux
richard@sophia:~/cc/tagger$ cat /etc/slackware-version 
Slackware 14.2
richard@sophia:~/cc/tagger$ 

On a colleague's Fedora box:

[devuser@localhost jqhack]$ ./jq-linux64 -ar '.foo' <<< '{ "foo": "bar" }'
"bar"
Segmentation fault (core dumped)
[devuser@localhost jqhack]$ ./jq-linux64 -r '.foo' <<< '{ "foo": "bar" }'
bar
[devuser@localhost jqhack]$ ./jq-linux64 -a '.foo' <<< '{ "foo": "bar" }'
"bar"
[devuser@localhost jqhack]$ ./jq-linux64 --version
jq-1.6
[devuser@localhost jqhack]$ uname -a
Linux localhost.localdomain 4.15.13-300.fc27.x86_64 #1 SMP Mon Mar 26 19:06:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[devuser@localhost jqhack]$ cat /etc/fedora-release
Fedora release 27 (Twenty Seven)
[devuser@localhost jqhack]$

Note that in these cases, there are no non-ascii chars to deal with.

@rain-1
Copy link
Contributor

rain-1 commented Mar 18, 2019

Hello, building from source the patch i made #1789 produces this result:

$ ./out/bin/jq -ar '.foo' <<< '{ "foo": "bar" }'
bar
$ ./out/bin/jq -r '.foo' <<< '{ "foo": "bar" }'
bar
$ ./out/bin/jq -ar '.foo' <<< '{ "foo": "bar" }'
bar

Is this the correct/desired behavior?

I have not been able to reproduce the segfault, has anybody else? I may try setting up fedora VM to test it.

@rswheeldon
Copy link

Good point. I was building from the main JQ master, not your fork. Building from rain-1/jq fixes the problem:

richard@sophia:~/jq/jq$ /usr/local/bin/jq -ar '.foo' <<< '{ "foo": "bar" }'
"bar"
Segmentation fault
richard@sophia:~/jq/jq$ ./jq -ar '.foo' <<< '{ "foo": "bar" }'
bar
richard@sophia:~/jq/jq$ git log | head -5
commit a3b18f8010ab43122bf55b7148d580cbb6323f55
Author: rain <rain1@airmail.cc>
Date:   Mon Jan 14 13:44:01 2019 +0000

    move function prototype to vj.h instead of jq.h
richard@sophia:~/jq/jq$ 

@nicowilliams nicowilliams self-assigned this Mar 25, 2019
bb010g added a commit to bb010g/jq that referenced this issue Oct 22, 2019
Escapes are still printed whenever characters
outside the ASCII plane are encountered. To avoid
ambiguity, backslash is the only ASCII character
escaped (as `\\`).

Fixes jqlang#1788, properly this time. Closes jqlang#1789.
bb010g added a commit to bb010g/jq that referenced this issue Oct 22, 2019
Escapes are still printed whenever characters
outside the ASCII plane are encountered. To avoid
ambiguity, backslash is the only ASCII character
escaped (as `\\`).

Fixes jqlang#1788, properly this time. Closes jqlang#1789.
@bb010g bb010g linked a pull request Oct 22, 2019 that will close this issue
bb010g added a commit to bb010g/jq that referenced this issue Oct 26, 2019
Escapes are still printed whenever characters
outside the ASCII plane are encountered. To avoid
ambiguity, backslash is the only ASCII character
escaped (as `\\`).

Fixes jqlang#1788, properly this time. Closes jqlang#1789.
bb010g added a commit to bb010g/jq that referenced this issue Sep 20, 2020
Escapes are still printed whenever characters
outside the ASCII plane are encountered. To avoid
ambiguity, backslash is the only ASCII character
escaped (as `\\`).

Fixes jqlang#1788, properly this time. Closes jqlang#1789.
bb010g added a commit to bb010g/jq that referenced this issue Sep 20, 2020
Escapes are still printed whenever characters
outside the ASCII plane are encountered. To avoid
ambiguity, backslash is the only ASCII character
escaped (as `\\`).

Fixes jqlang#1788, properly this time. Closes jqlang#1789.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants