-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow, and strip, comments (#, // or /*) #2548
base: master
Are you sure you want to change the base?
Conversation
Possibly this failure has nothing to do with the actual code changes. Here is what the log says:
|
Ref: #1785 |
Oof, I'm not sure that C-style comments in jq code are a good idea, but I understand that if we were to accept them in JSON inputs then maybe we should accept them in jq. The other thing is that we really do need an option for JSON input strictness. I find JSON comments to be useless. We couldn't preserve them in jq because... what do we attach them to? I guess maybe we could have So what I do when I need to write comments in JSON texts is this: {
"comment0": "This string is a comment",
"a":"whatever",
"comment1": "This is another comment, and the only reason I change the comment name is in case there's any processors out there that reject duplicate keys"
} which does not allow me to have comments inside arrays, but hey, it works. |
@nicowilliams there is a proposal by @pkoppstein about a strictness option #2643 |
I'm very worried about the conflict against the alternative operator. |
Yes, we really can't have |
@nicowilliams @wader @itchyny I think it's worth further dividing the target use cases / proposals into sub-scopes, given that the arguments for and against may be different for each.
Can we all agree that the limited scope of simply stripping comments from JSON input (1a above) is cut & dry? The comments can just be ignored and we are done. It seems to me that all the various concerns that have been raised are only relevant in the other 3 scopes (1b, 2a, 2b) |
Yes, ignoring JSON "comments" on input is fine, but I would like a command-line option for strictness, and we'll probably want strict and non-strict versions of I understand that some users have an urgent need to strip out JSON comments. I also imagine that some users like to validate JSON with jq and don't want jq to become more permissive than it is right now. That's the tension here for me. All the other concerns have to do with the jq language and/or with how to represent comments internally. We can certainly punt on those and not attempt to add new types of comments to the jq language, and we can also not attempt to preserve comments in JSON parsing. (I suspect we'll never want to preserve comments in JSON parses for the simple reason that it's way too bloating for |
Instead of extending the current lenient parser, and instead of just skipping the JavaScript-like comments, I would like to make the parser conforming to well-defined specifications. My preference is to make the default parser to follow RFC 8259, and support |
@itchyny I think it is a great idea to target well-defined specifications. Re json5, the immediate impact with of that, with respect to this particular issue, is that Beyond that, json5 would also support single- or multi-line strings strings that are unquoted, or that are enveloped in single quotes, and while I agree that those would be useful features, I would also propose treating them as a separate issue / PR for practical purposes. In particular, looking at |
Note that JSON5 is neither an ECMA standard nor an IETF standard. Not that jq only implements standards, mind you, just that JSON5 isn't one. That shouldn't stop us adopting optional support for JSON5, but we probably don't want it to be the default. Indeed, I would like to have JSON5 support if for no other reason that object keys that are identifiers can be left unquoted, that and the trailing comma feature -- these going to be nice to have. BTW, one thing that @stedolan told me long ago is that all these command-line options are kinda ugly, that we should aim to make it possible to handle all of them in jq code. So for example, Now, for jq code itself one of the things we'll need is a form of |
Also, I think we need to separate all the parsers for all flavors of JSON. I'd rather have lots of code duplication than to have lots of branches complicating and slowing down parsing. Ditto printing. The price to pay in bloat might be annoying, so we might need |
OK this PR has been updated and now covers the following:
I understand there may not yet be consensus around the use of a |
@liquidaty ah, do please get rid of the merge commit. Neither @stedolan nor I like them -- we prefer linear history upstream. If you don't know how to get rid of it I could walk you through it, or I could force push to this PR. |
Will this PR be going to 1.7? We need more discussions on this topic. As a user, it does not mention the formal definition about the comment syntax, so it's not clear which form of comments are stripped; e.x. multiline comments can be nested or not. Dealing with comments in JSON will soon raise a feature request to re-format a JSON file with keeping comments, but I'm afraid such request is hard to implement within the current parser. |
Great question! I think yeah, it's probably best to wait till after 1.7 to integrate this. Because the |
@nicowilliams I unfortunately do not know (for sure) how to get rid of it and would be happy to try, but I think at least as likely to make it worse than better. Please feel free to walk me thru it or force push yourself, whatever is more convenient for you |
@itchyny @nicowilliams: regarding the limited scope of just stripping C- and C++-style behavior is (i.e. 1a from #2548 (comment)), then given:
then, I hope you will consider, for your own sakes as well as that of others waiting on this feature: just accept it in 1.7, thereby putting the whole bunch of users looking for this at ease, and buying yourself plenty of time to later decide on all the other issues in the broader comment-related scope |
@liquidaty here's what I did:
|
@liquidaty you'll want to review the new PR HEAD, and you'll want to fetch it as well and compare to your local branch so you can use |
a4fb51e
to
80e9bea
Compare
@nicowilliams awesome! super helpful and definitely not something I would have done correctly, so thank you. I do see some minor changes that were dropped, that I probably should add back in-- they are minor in that they will not show up in the current checks where input is piped directly into Just to confirm, if I now just make those changes (touching a few lines to jv_parse.c to return 0 instead of OK) to origin/strip-comments and push them as a new commit, it will retain the history the way you prefer, correct? |
You'll want to apply those changes on top of the commit I pushed. Something like:
and then whatever your old branch was in your local clone, you can now abandon it, delete it, or make its |
@nicowilliams good to go now, thanks for your help! |
One reason to wait till after 1.7 for this is that I've been fuzzing the current state of the parser, so I'd rather not touch the parser until after 1.7 :) |
@nicowilliams I see. Totally off-topic: out of curiosity, what fuzzing tools are you using? am new to the art of fuzzing and looking to learn more about how it's being done in practice. Also have been considering suggesting some simd-related improvements to |
I'm using AFL, because I know how to use it and I've used it before. But see #2255 and #2688. Some fuzzers like LibFuzzer and HongFuzz require that you provide a function that takes a pointer to data and a length, and then you link with their object files to produce an executable that fuzzes that function, and when you run it you have to provide an initial test corpus. AFL requires (mostly) that you build with a version of |
@nicowilliams thank you, much appreciated |
@liquidaty can you rebase this? |
If it's ok i can rebase this but not sure if i can push to the PR branch? |
What do you mean? Just rebase it on a fork and submit it as a PR, it'll create a "branch" automatically. You don't need to ask for permission. Maybe I'm misunderstanding something, though. |
@ImportTaste Sure could do that. But it would nice to push a rebase to this PR (2548). How to is described here https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/committing-changes-to-a-pull-request-branch-created-from-a-fork and i think i have persimmon to do it but it's probably good manners to ask first |
Well, given the lack of response, you may want to just go ahead and do it, I'm guessing the maintainers aren't looking too closely at a stale PR where the submitter has been MIA for a while now. |
b274bda
to
b7a41bb
Compare
b7a41bb
to
334d15c
Compare
src/jv_parse.c
Outdated
@@ -703,11 +702,7 @@ static pfunc scan_json(struct jv_parser* p, char ch, jv* out) { | |||
presult answer = 0; | |||
p->last_ch_was_ws = 0; | |||
if (p->st == JV_PARSER_NORMAL) { | |||
if(ch == '#') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So line comments are gone? Can you squash this with the other commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap can do that. When i rebased i tried to keep the existing commits as intact as possible as a first step.
I can't find any discussion why it was removed. Do we think ppl will expect this to skip #
-line comments also? json5 seems to have them but the various "jsonc" variants seem not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Remember to update PR title and commits to reflect if # is support or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually see that to begin with in the original PR code at https://github.com/liquidaty/jq/blob/strip-comments/src/jv_parse.c, so I'm not sure how it got there to begin with before it was removed-- but maybe I'm confusing myself and I put it in some earlier or later version than I'm now looking at.
Looking back and trying to refresh my memory on this, I think I removed the #
line comments only because I was trying to make the PR as narrow as possible to get consensus on having the PR accepted.
I don't believe there was any technical reason to remove it (other than having the same flaw in the return value that all of the similar code had and that was subsequently fixed)-- so to reinstate, it should just be:
static pfunc scan_json(struct jv_parser* p, char ch, jv* out) {
...
if(ch == '/' && (p->flags & JV_PARSE_STRIP_COMMENTS)) {
p->scan = scan_slash_comment;
return answer;
}
+ if(ch == '#' && (p->flags & JV_PARSE_STRIP_COMMENTS)) {
+ p->scan = scan_line_comment;
+ return answer;
+ }
PS I have no opinion on whether it should be reinstated or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it seems i was confused about json5 supporting #
-comments (it does not) my opinion is that we can skip support them for now
@@ -649,7 +649,7 @@ static int stream_is_top_num(struct jv_parser* p) { | |||
static pfunc scan_line_comment(struct jv_parser* p, char ch, jv* out) { | |||
if(ch == '\n') | |||
p->scan = scan_json; | |||
return OK; | |||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some commentary in the commit message about why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Trying to understand the 0
-> OK
change. OK
is static const presult OK = "output produced";
so i guess something else must have changed before the change? only reference i can find is in #2548 (comment). Will digg more later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see that, but I'm not sure why that's needed, and anyways, there's places where we look for OK
that need to be changed correspondingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc the reason for this is fairly straightforward, which is that in the below block of jv_parse.c:
jv jv_parser_next(struct jv_parser* p) {
...
while (!msg && p->curr_buf_pos < p->curr_buf_length) {
...
msg = scan(p, ch, &value);
}
the desired behavior, when a comment is encountered, is to simply ignore it and keep on going. Therefore the while
loop must keep on going, which requires that !msg
is true. Any other result will break the while loop, which would be incorrect.
That said, given that the return type of scan()
is a pointer, it might be better for the scan_xxx()
functions to all return NULL where they currently are returning of 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liquidaty thanks for the explanation. think i tried to return OK
instead when played around a bit with the code and pretty sure it also worked for some reason, was confusing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be useful to define IGNORE_AND_CONTINUE
as a presult
equal to NULL (or alternatively, not null, with other corresponding changes e.g. !msg
becomes msg == IGNORE_AND_CONTINUE
)
No description provided.