Allow, and strip, comments (#, // or /*) #2548

liquidaty · 2023-03-02T19:04:10Z

No description provided.

liquidaty · 2023-03-02T19:17:42Z

Possibly this failure has nothing to do with the actual code changes. Here is what the log says:

C:\msys64\usr\bin\bash -lc "pacman-key --verify msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz.sig"
==> Checking msys2-keyring-r21.b39fb11-1-any.pkg.tar.xz.sig...
gpg: no valid OpenPGP data found.
gpg: the signature could not be verified.
Please remember that the signature file (.sig or .asc)
should be the first file given on the command line.
==> ERROR: The signature identified by msys2-keyring-r21.b[39](https://ci.appveyor.com/project/stedolan/jq/builds/46391082/job/ngd77yg97ie5ls1f#L39)fb11-1-any.pkg.tar.xz.sig could not be verified.
Command exited with code 1
cat config.log
cat: config.log: No such file or directory

itchyny · 2023-06-03T03:08:36Z

Ref: #1785

nicowilliams · 2023-07-05T22:07:32Z

Oof, I'm not sure that C-style comments in jq code are a good idea, but I understand that if we were to accept them in JSON inputs then maybe we should accept them in jq.

The other thing is that we really do need an option for JSON input strictness.

I find JSON comments to be useless. We couldn't preserve them in jq because... what do we attach them to? I guess maybe we could have has_comment/0/get_comment/0, but there's problems with that: to what values do comments attach? do we want the bloat hit on jv? do we want to output comments attached to values?

So what I do when I need to write comments in JSON texts is this:

{
   "comment0": "This string is a comment",
   "a":"whatever",
   "comment1": "This is another comment, and the only reason I change the comment name is in case there's any processors out there that reject duplicate keys"
}

which does not allow me to have comments inside arrays, but hey, it works.

wader · 2023-07-05T22:39:52Z

@nicowilliams there is a proposal by @pkoppstein about a strictness option #2643

itchyny · 2023-07-05T22:59:16Z

I'm very worried about the conflict against the alternative operator.

nicowilliams · 2023-07-05T23:01:43Z

I'm very worried about the conflict against the alternative operator.

Yes, we really can't have // comment, that much is clear, but I don't think I want even /* comment */ in the jq language, though I'd be ok with stripping those from input JSON texts.

liquidaty · 2023-07-05T23:18:10Z

@nicowilliams @wader @itchyny I think it's worth further dividing the target use cases / proposals into sub-scopes, given that the arguments for and against may be different for each.

Stripping (as opposed to parsing) only:

1a. Strip from JSON input. All we are doing is allowing JSON input to have comments, and ignoring those comments. Everything downstream from there is the same as if the input JSON never had comments to begin with. Example: echo '{ "hi": "there" /* this is a comment */ }' | jq '.' -c yields {"hi":"there"}
- Note 1: the concern re alternative operator conflict is N/A here
- Note 2: the original intended scope of this issue Allow, and strip, comments (#, // or /*) #2548 is limited to this sub-scope, and was not intended to extend any further
1b. Strip from comments from jq filters. Same as above but for filters rather than JSON input. Example: echo '{ "hi": "there" }' | jq '. /* this is a comment */ ' -c yields {"hi":"there"}

Parsing comments, which then become part of the parse tree and are available to be included in the output

2a. Parse from JSON input. Example (with extraneous spaces in the JSON input intentionally added): echo '{ "hi": "there" /* this is a comment */ }' | jq '.' -c yields {"hi":"there" /* this is a comment */ }
- Raises several issues, including where the comment attaches to and whether the comment output style should reflect the original style
2b. Parse from jq filter.
- Raises several issues, including where the comment attaches to, whether the comment output style should reflect the original style, and C++-style comments potentially conflicting with the alternative operator

Can we all agree that the limited scope of simply stripping comments from JSON input (1a above) is cut & dry? The comments can just be ignored and we are done. It seems to me that all the various concerns that have been raised are only relevant in the other 3 scopes (1b, 2a, 2b)

nicowilliams · 2023-07-05T23:52:09Z

Yes, ignoring JSON "comments" on input is fine, but I would like a command-line option for strictness, and we'll probably want strict and non-strict versions of fromjson so that jq code can also benefit from strictness. The next question is: which must come first, options for strictness of comment stripping?

I understand that some users have an urgent need to strip out JSON comments. I also imagine that some users like to validate JSON with jq and don't want jq to become more permissive than it is right now. That's the tension here for me.

All the other concerns have to do with the jq language and/or with how to represent comments internally. We can certainly punt on those and not attempt to add new types of comments to the jq language, and we can also not attempt to preserve comments in JSON parsing. (I suspect we'll never want to preserve comments in JSON parses for the simple reason that it's way too bloating for jv.)

itchyny · 2023-07-06T03:25:50Z

Instead of extending the current lenient parser, and instead of just skipping the JavaScript-like comments, I would like to make the parser conforming to well-defined specifications. My preference is to make the default parser to follow RFC 8259, and support --format=json5 following https://spec.json5.org/ (if we really want to support JSON with comments), and maybe --format=yaml for https://yaml.org/spec/, and --format=jq-json for transition from the current lenient parser (if someone really want).

liquidaty · 2023-07-06T17:08:32Z

@itchyny I think it is a great idea to target well-defined specifications. Re json5, the immediate impact with of that, with respect to this particular issue, is that # would no longer be a comment delimiter, and both C- and C++-style comments would remain in scope.

Beyond that, json5 would also support single- or multi-line strings strings that are unquoted, or that are enveloped in single quotes, and while I agree that those would be useful features, I would also propose treating them as a separate issue / PR for practical purposes. In particular, looking at jv_parse.c it seems to me that the changes necessary to accommodate these string parsing features will be significantly more invasive, and even more so if parsing yaml (so much so that if I were to take on those tasks-- which I probably cannot do, alas-- I might find it easier to swap out the JSON parser to use a parser generator rather than a handwritten one-- in which case that parser change might itself be a standalone issue/PR).

nicowilliams · 2023-07-06T19:11:49Z

Note that JSON5 is neither an ECMA standard nor an IETF standard. Not that jq only implements standards, mind you, just that JSON5 isn't one. That shouldn't stop us adopting optional support for JSON5, but we probably don't want it to be the default. Indeed, I would like to have JSON5 support if for no other reason that object keys that are identifiers can be left unquoted, that and the trailing comma feature -- these going to be nice to have.

BTW, one thing that @stedolan told me long ago is that all these command-line options are kinda ugly, that we should aim to make it possible to handle all of them in jq code. So for example, -s is the same as -n + starting the jq program with [inputs] | , and not using -s is the same as using -n and starting the program with inputs | . Now, to really make that possible for most / all jq command-line options will take a lot of work, perhaps including finishing the co-routines features.

Now, for jq code itself one of the things we'll need is a form of fromjson/tojson for each supported format. And I agree that it'd be very nice to have not just spec conformance, but also YAML support, and maybe even XML. So maybe we need fromjson5/tojson5.

nicowilliams · 2023-07-06T19:12:56Z

Also, I think we need to separate all the parsers for all flavors of JSON. I'd rather have lots of code duplication than to have lots of branches complicating and slowing down parsing. Ditto printing. The price to pay in bloat might be annoying, so we might need ./configure options for disabling various features.

liquidaty · 2023-07-07T13:46:48Z

OK this PR has been updated and now covers the following:

C- and C++-style comments (but not # comments) are stripped
this behavior is only enabled when a --strip-comments option is enabled
added a couple tests in tests/commentstest

I understand there may not yet be consensus around the use of a --strip-comments option, but figured I'd include it for now and you can decide when you merge if you don't want it, or want to change it (or lmk and I can update the PR).

src/lexer.l

src/main.c

nicowilliams · 2023-07-09T15:10:43Z

@liquidaty ah, do please get rid of the merge commit. Neither @stedolan nor I like them -- we prefer linear history upstream. If you don't know how to get rid of it I could walk you through it, or I could force push to this PR.

itchyny · 2023-07-09T15:22:49Z

Will this PR be going to 1.7? We need more discussions on this topic. As a user, it does not mention the formal definition about the comment syntax, so it's not clear which form of comments are stripped; e.x. multiline comments can be nested or not. Dealing with comments in JSON will soon raise a feature request to re-format a JSON file with keeping comments, but I'm afraid such request is hard to implement within the current parser.

nicowilliams · 2023-07-09T15:28:04Z

Will this PR be going to 1.7? We need more discussions on this topic. As a user, it does not mention the formal definition about the comment syntax, so it's not clear which form of comments are stripped; e.x. multiline comments can be nested or not. Dealing with comments in JSON will soon raise a feature request to re-format a JSON file with keeping comments, but I'm afraid such request is hard to implement within the current parser.

Great question! I think yeah, it's probably best to wait till after 1.7 to integrate this.

Because the jv representation of JSON values really can't represent comments without a lot of bloat, I think we'll never agree to preserve comments.

liquidaty · 2023-07-10T01:16:25Z

@liquidaty ah, do please get rid of the merge commit [...] If you don't know how to get rid of it I could walk you through it, or I could force push to this PR

@nicowilliams I unfortunately do not know (for sure) how to get rid of it and would be happy to try, but I think at least as likely to make it worse than better. Please feel free to walk me thru it or force push yourself, whatever is more convenient for you

liquidaty · 2023-07-10T02:01:19Z

Will this PR be going to 1.7? We need more discussions on this topic

probably best to wait till after 1.7 to integrate this

@itchyny @nicowilliams: regarding the limited scope of just stripping C- and C++-style behavior is (i.e. 1a from #2548 (comment)), then given:

how uncontroversial that limited scope is
how many users want this (this has got to be a top-10 most asked-for feature, maybe top 3, maybe top 1: see e.g. JSON comments-- again #1571, Comments in Json #402, support comments in json #695, Comments #1398), and
easy it would be to include in 1.7 (since we already have here a fully working PR that passes all checks, with the only remaining difference of opinion being over what description is used in help and/or man pages-- and I'm fairly confident that the majority of users looking for this solution won't mind whatever outcome you decide on that)

then, I hope you will consider, for your own sakes as well as that of others waiting on this feature: just accept it in 1.7, thereby putting the whole bunch of users looking for this at ease, and buying yourself plenty of time to later decide on all the other issues in the broader comment-related scope

nicowilliams · 2023-07-10T02:37:28Z

@liquidaty here's what I did:

git fetch origin pull/2548/head:pr2548
git checkout pr2548
git rebase -i origin/master
- fix conflicts
the merge commit was dropped automatically by the rebase
reviewed and noticed that one commit was mostly the reversal of the other
git rebase -i origin/master
- mark 669cac5 for edit, save, let the rebase run
- git checkout -f HEAD^ -- src/main.c
- git commit --amend
- git checkout 669cac5 -- src/main.c
- git commit --fixup HEAD^
- git rebase --continue
git rebase -i origin/master again
- move the update --strip-comments description; remove lexer changes commit to right after add c-style comment stripping from jq program input, then save and let the rebase run
- notice that now there are no diffs between allow, and strip, comments (#, // or /*) and update --strip-comments description; remove lexer changes
git rebase -i origin/master again and mark those to commits to be droped, let it run
notice that the last commit should be part of the previous commit, so git rebase -i origin/master once again and squash the last commit into the previous (I could just have done git checkout HEAD^ && git commit --amend instead), but also mark the first commit for reword and capitalize the first letter of its synopsis, then do the same for the now last commit
git remote add liquidaty https://github.com/liquidaty/jq
git remote set-url --push liquidaty git@github.com:liquidaty/jq
git push -f liquidaty HEAD:strip-comments

nicowilliams · 2023-07-10T02:39:26Z

@liquidaty you'll want to review the new PR HEAD, and you'll want to fetch it as well and compare to your local branch so you can use git diff to see the differences. The only differences should be: a) all the things in jqlang/jq's master that you didn't already have, b) the lexer changes that got undone.

liquidaty · 2023-07-10T03:16:09Z

@nicowilliams awesome! super helpful and definitely not something I would have done correctly, so thank you.

I do see some minor changes that were dropped, that I probably should add back in-- they are minor in that they will not show up in the current checks where input is piped directly into jq, but without these changes added back in, they would show up if, like in jq_test.c, the input was first parsed using jv_parse(), which relies on the return code being 0 and not OK.

Just to confirm, if I now just make those changes (touching a few lines to jv_parse.c to return 0 instead of OK) to origin/strip-comments and push them as a new commit, it will retain the history the way you prefer, correct?

nicowilliams · 2023-07-10T03:21:38Z

Just to confirm, if I now just make those changes (touching a few lines to jv_parse.c to return 0 instead of OK) to origin/strip-comments and push them as a new commit, it will retain the history the way you prefer, correct?

You'll want to apply those changes on top of the commit I pushed. Something like:

git fetch origin pull/2548/head:pr2548
git checkout pr2458
git add ...
git commit -m ...
git push $your_repos_name pr2548:strip-comments

and then whatever your old branch was in your local clone, you can now abandon it, delete it, or make its HEAD be the same as pr2548's (git branch -f $that_local_branch pr2548).

liquidaty · 2023-07-10T05:37:51Z

@nicowilliams good to go now, thanks for your help!

nicowilliams · 2023-07-10T16:00:12Z

One reason to wait till after 1.7 for this is that I've been fuzzing the current state of the parser, so I'd rather not touch the parser until after 1.7 :)

liquidaty · 2023-07-10T17:11:54Z

@nicowilliams I see. Totally off-topic: out of curiosity, what fuzzing tools are you using? am new to the art of fuzzing and looking to learn more about how it's being done in practice. Also have been considering suggesting some simd-related improvements to jq's parser, so might come in handy to understand the fuzzing context in which such suggestions might be considered

nicowilliams · 2023-07-10T17:18:24Z

@nicowilliams I see. Totally off-topic: out of curiosity, what fuzzing tools are you using? am new to the art of fuzzing and looking to learn more about how it's being done in practice. Also have been considering suggesting some simd-related improvements to jq's parser, so might come in handy to understand the fuzzing context in which such suggestions might be considered

I'm using AFL, because I know how to use it and I've used it before. But see #2255 and #2688.

Some fuzzers like LibFuzzer and HongFuzz require that you provide a function that takes a pointer to data and a length, and then you link with their object files to produce an executable that fuzzes that function, and when you run it you have to provide an initial test corpus. AFL requires (mostly) that you build with a version of gcc or clang that instruments the code, and then you you run the executable under afl-fuzz with options that specify an initial test corpus.

liquidaty · 2023-07-12T20:07:30Z

@nicowilliams thank you, much appreciated

nicowilliams · 2024-01-16T22:00:17Z

@liquidaty can you rebase this?

wader · 2024-02-17T08:52:37Z

If it's ok i can rebase this but not sure if i can push to the PR branch?

God-damnit-all · 2024-02-24T16:03:54Z

If it's ok i can rebase this but not sure if i can push to the PR branch?

What do you mean? Just rebase it on a fork and submit it as a PR, it'll create a "branch" automatically. You don't need to ask for permission. Maybe I'm misunderstanding something, though.

wader · 2024-02-24T16:20:13Z

@ImportTaste Sure could do that. But it would nice to push a rebase to this PR (2548). How to is described here https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork and i think i have persimmon to do it but it's probably good manners to ask first

God-damnit-all · 2024-02-24T16:46:19Z

@ImportTaste Sure could do that. But it would nice to push a rebase to this PR (2548). How to is described here https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork and i think i have persimmon to do it build it's probably good manners to ask first

Well, given the lack of response, you may want to just go ahead and do it, I'm guessing the maintainers aren't looking too closely at a stale PR where the submitter has been MIA for a while now.

wader · 2024-02-24T17:45:41Z

Rebased and pushed. Hope i got things right. There were two commits that had conflicting or related changes:

71e7bcd Revert "lexer: temporarily revert #\ patch; keep CR in comment bug fix"
3a1ba0c Only care about RS when parsing with --seq

nicowilliams · 2024-02-26T02:34:39Z

src/jv_parse.c

@@ -703,11 +702,7 @@ static pfunc scan_json(struct jv_parser* p, char ch, jv* out) {
  presult answer = 0;
  p->last_ch_was_ws = 0;
  if (p->st == JV_PARSER_NORMAL) {
-    if(ch == '#') {


So line comments are gone? Can you squash this with the other commit?

Yeap can do that. When i rebased i tried to keep the existing commits as intact as possible as a first step.

I can't find any discussion why it was removed. Do we think ppl will expect this to skip #-line comments also? json5 seems to have them but the various "jsonc" variants seem not.

Note to self: Remember to update PR title and commits to reflect if # is support or not

I don't actually see that to begin with in the original PR code at https://github.com/liquidaty/jq/blob/strip-comments/src/jv_parse.c, so I'm not sure how it got there to begin with before it was removed-- but maybe I'm confusing myself and I put it in some earlier or later version than I'm now looking at.

Looking back and trying to refresh my memory on this, I think I removed the # line comments only because I was trying to make the PR as narrow as possible to get consensus on having the PR accepted.

I don't believe there was any technical reason to remove it (other than having the same flaw in the return value that all of the similar code had and that was subsequently fixed)-- so to reinstate, it should just be:

static pfunc scan_json(struct jv_parser* p, char ch, jv* out) { ... if(ch == '/' && (p->flags & JV_PARSE_STRIP_COMMENTS)) { p->scan = scan_slash_comment; return answer; } + if(ch == '#' && (p->flags & JV_PARSE_STRIP_COMMENTS)) { + p->scan = scan_line_comment; + return answer; + }

PS I have no opinion on whether it should be reinstated or not

As it seems i was confused about json5 supporting #-comments (it does not) my opinion is that we can skip support them for now

nicowilliams · 2024-02-26T02:35:47Z

src/jv_parse.c

@@ -649,7 +649,7 @@ static int stream_is_top_num(struct jv_parser* p) {
 static pfunc scan_line_comment(struct jv_parser* p, char ch, jv* out) {
  if(ch == '\n')
    p->scan = scan_json;
-  return OK;
+  return 0;


Can we have some commentary in the commit message about why this change?

Trying to understand the 0 -> OK change. OK is static const presult OK = "output produced"; so i guess something else must have changed before the change? only reference i can find is in #2548 (comment). Will digg more later.

Yeah, I see that, but I'm not sure why that's needed, and anyways, there's places where we look for OK that need to be changed correspondingly.

iirc the reason for this is fairly straightforward, which is that in the below block of jv_parse.c:

jv jv_parser_next(struct jv_parser* p) { ... while (!msg && p->curr_buf_pos < p->curr_buf_length) { ... msg = scan(p, ch, &value); }

the desired behavior, when a comment is encountered, is to simply ignore it and keep on going. Therefore the while loop must keep on going, which requires that !msg is true. Any other result will break the while loop, which would be incorrect.

That said, given that the return type of scan() is a pointer, it might be better for the scan_xxx() functions to all return NULL where they currently are returning of 0.

@liquidaty thanks for the explanation. think i tried to return OK instead when played around a bit with the code and pretty sure it also worked for some reason, was confusing

Maybe it would be useful to define IGNORE_AND_CONTINUE as a presult equal to NULL (or alternatively, not null, with other corresponding changes e.g. !msg becomes msg == IGNORE_AND_CONTINUE)

liquidaty mentioned this pull request Mar 2, 2023

JSON comments-- again #1571

Open

liquidaty mentioned this pull request Mar 7, 2023

organizing to release JQ 1.6.2 and JQ 1.7: Your suggestions/comments would be appreciated #2550

Closed

itchyny added the feature request label Jun 3, 2023

nicowilliams reviewed Jul 7, 2023

View reviewed changes

src/lexer.l Outdated Show resolved Hide resolved

nicowilliams reviewed Jul 7, 2023

View reviewed changes

src/main.c Outdated Show resolved Hide resolved

nicowilliams force-pushed the strip-comments branch from a4fb51e to 80e9bea Compare July 10, 2023 02:47

liquidaty mentioned this pull request Jul 10, 2023

simdjson #1892

Open

heurtematte mentioned this pull request Aug 2, 2023

feat: add comment in cbi config file eclipse-cbi/ci-admin#20

Closed

Allow, and strip, JSON comments (#, // or /*)

e659a5e

wader force-pushed the strip-comments branch 2 times, most recently from b274bda to b7a41bb Compare February 24, 2024 17:41

wader changed the title ~~allow, and strip, comments (#, // or /*)~~ Allow, and strip, comments (#, // or /*) Feb 24, 2024

liquidaty added 2 commits February 24, 2024 18:58

Add --strip-comments option

8b479ca

update scanner to, while scanning comments, return 0 instead of OK

334d15c

wader force-pushed the strip-comments branch from b7a41bb to 334d15c Compare February 24, 2024 17:59

nicowilliams reviewed Feb 26, 2024

View reviewed changes

lishaduck mentioned this pull request Jul 7, 2024

Use jq instead of deno eval in CI [take 2] lowlighter/libs#69

Merged

Allow, and strip, comments (#, // or /*) #2548

Are you sure you want to change the base?

Allow, and strip, comments (#, // or /*) #2548

Conversation

liquidaty commented Mar 2, 2023

liquidaty commented Mar 2, 2023

itchyny commented Jun 3, 2023

nicowilliams commented Jul 5, 2023

wader commented Jul 5, 2023

itchyny commented Jul 5, 2023

nicowilliams commented Jul 5, 2023

liquidaty commented Jul 5, 2023 • edited Loading

nicowilliams commented Jul 5, 2023

itchyny commented Jul 6, 2023

liquidaty commented Jul 6, 2023

nicowilliams commented Jul 6, 2023

nicowilliams commented Jul 6, 2023

liquidaty commented Jul 7, 2023

nicowilliams commented Jul 9, 2023

itchyny commented Jul 9, 2023

nicowilliams commented Jul 9, 2023

liquidaty commented Jul 10, 2023

liquidaty commented Jul 10, 2023

nicowilliams commented Jul 10, 2023 • edited Loading

nicowilliams commented Jul 10, 2023

liquidaty commented Jul 10, 2023

nicowilliams commented Jul 10, 2023

liquidaty commented Jul 10, 2023

nicowilliams commented Jul 10, 2023

liquidaty commented Jul 10, 2023

nicowilliams commented Jul 10, 2023

liquidaty commented Jul 12, 2023

nicowilliams commented Jan 16, 2024

wader commented Feb 17, 2024

God-damnit-all commented Feb 24, 2024 • edited Loading

wader commented Feb 24, 2024 • edited Loading

God-damnit-all commented Feb 24, 2024

wader commented Feb 24, 2024 • edited Loading

Choose a reason for hiding this comment

wader Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liquidaty Feb 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wader Feb 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liquidaty commented Jul 5, 2023 •

edited

Loading

nicowilliams commented Jul 10, 2023 •

edited

Loading

God-damnit-all commented Feb 24, 2024 •

edited

Loading

wader commented Feb 24, 2024 •

edited

Loading

wader commented Feb 24, 2024 •

edited

Loading

wader Feb 26, 2024 •

edited

Loading

liquidaty Feb 29, 2024 •

edited

Loading

wader Feb 26, 2024 •

edited

Loading