Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON with Comments (jsonc) support #197

Closed
denosaurtrain opened this issue Jul 17, 2024 · 12 comments
Closed

JSON with Comments (jsonc) support #197

denosaurtrain opened this issue Jul 17, 2024 · 12 comments

Comments

@denosaurtrain
Copy link

denosaurtrain commented Jul 17, 2024

Feature/Workaround Request

Request: an option to make jaq ignore comments in input like this:

{
    /*
       block comment
    */
    "a": true // line comment
}

Today, jaq . file.json fails with this error:

Error: failed to parse: string expected

If this is considered a "Won't Fix", any suggestions for workarounds?

What is jsonc, and who cares?

@@ 👇 clarification based on discussion in comments @@

- Some tools allow comments in JSON files, sometimes called "JSON with Comments" or "jsonc"
+ The format of comments described above is "JSON with Comments" and abbreviated as "jsonc".
+ "jsonc" is not JSON, but rather a superset of JSON that adds support for comments.

This is common for config files, such as:

  • all VS Code config files
  • deno.json
  • appsettings.json (ASP.NET)

How VS Code describes it:

When in the JSON with Comments mode, you can use single line (//) as well as block comments (/* */) as used in JavaScript. The mode also accepts trailing commas, but they are discouraged and the editor will display a warning.

IMO, trailing commas are out of scope for this issue, but I don't feel strongly about that.

Concrete use case

Say I want to enforce in CI/pre-commit that certain config parameters aren't being used. I can't use jaq for this today because my settings files almost always have comments, including ones for this use case:

{
    // ...
    "rust-analyzer.cargo.features": "all",
    "rust-analyzer.check.command": "clippy",
    // ...
    "prettier.enable": true // the only prettier setting allowed
    /*
        Reminder: code formatting rules do not belong in vscode settings files,
        because they don't play nicely with other editors and CI jobs.

        In other words, don't add rules like:

        "prettier.useTabs": false

        Instead, use .prettierrc.yaml to change formatting rules.
    */
    // ...
}

If jaq were able to parse this file (ignoring the comments), I could use a CI/pre-commit script like this:

jaq --exit-status '
  .
    | del(."prettier.enable")
    | with_entries(select(.key | startswith("prettier")))
    | length == 0
' ".vscode/settings.json"

Dead Ends

  • I checked for previous issues/discussions about this and didn't find anything.
  • I considered using rg/sed to strip comments before passing to jaq, but this is difficult to get right.
  • I found some tools that purport to strip comments, but nothing stood out as easy/popular/well-tested/safe. Arguably I should look more deeply into those options. I'm open to suggestions!
  • I started to implement this jaq feature myself. I added some failing tests but didn't get far before realizing I should open an issue to discuss it. Plus, while editing hifijson, I noticed that the jaq branch, faster-lexer, is being actively developed as of July 2024, so I worried that adding to hifijson could be a mistake. I see that branch now has a PR (Rewrite lexer and parser #196) -- perhaps that's only for parsing input expressions, not JSON input?
@01mf02
Copy link
Owner

01mf02 commented Jul 17, 2024

Hi @denosaurtrain! Thanks for taking your time to explain me your use case in such a detailed way!
Your suspicion in your last "dead end" is dead on --- the faster-lexer branch is for parsing/lexing jq programs, not JSON input. To allow jaq to parse JSON comments, modifying hifijson is the way to go. If you make a PR to hifijson, I am very much open to incorporate such a change.

@01mf02
Copy link
Owner

01mf02 commented Jul 17, 2024

I have read a bit about comments in JSON since my last message, and I have gained a different opinion in the meantime. In particular, I'm now more skeptical about allowing comments in JSON. In discussions like on Stack Overflow, there is a relatively strong consensus that comments should not be present in JSON, and if one wants to put in comments nonetheless, they can be removed for further processing (that's what you called your dead end no. 3).

I tried jsmin by the JSON author (Douglas Crockford), and it seems pretty simple to me: Go to https://github.com/douglascrockford/JSMin, download jsmin.c, run gcc jsmin.c -o jsmin, then ./jsmin <input.json | jaq . and you're good to go. Wouldn't that satisfy your needs of a tool that is easy/popular/well-tested/safe?

@denosaurtrain
Copy link
Author

jsmin looks promising, thanks! and only ~300 lines too. I'll try that out.

I respect your choice to not support comments. I'll close this issue. Hopefully the workaround will also help other folks who need this in the future.

Thanks for the quick responses and thoughtful feedback!

@pkoppstein
Copy link

@denosaurtrain - Another possibility for pre-processing is https://hjson.github.io/

@01mf02 - Since there are tools that strip comments from JSON, the horse is already out of the barn, so on balance I think it would probably make the world a better place if tools like jq/jaq had an option that would allow comments or at least one form of comments. Since jq and jaq do allow "#" comments in JSON that is presented as (part of) a jq/jaq program, my hope is you will get the ball rolling and add an option that would at least allow them. Not the highest priority by any means, of course, but maybe worth. re-opening this issue.

@denosaurtrain
Copy link
Author

what it means to support jsonc

01mf02
there is a relatively strong consensus that comments should not be present in JSON

I've read a few of these threads, and I agree that they're correct — JSON, per the spec, does not allow comments — but also not the whole story. The way I see it, once you add comments, it's no longer valid JSON but rather "JSON with Comments" (jsonc) a superset of JSON that adds support for comments.

the decision

I think either of these are reasonable decisions to make here:

  1. jaq supports only JSON and "JSON Lines" (status quo)
  2. jaq supports only JSON, "JSON Lines", and "JSON with Comments" (proposal)

While I personally want option 2, the argument for option 1 is strong.

why add jsonc (option 2)

Really it comes down to this:

pkoppstein
on balance I think it would probably make the world a better place if tools like jq/jaq had an option that would allow comments

From reading threads about this, it's clear that lots and LOTS of people stub their toe on this and need to find a workaround.

The feature should be opt-in, so there's no impact to users who want strict JSON only.

If it's easy to ignore comments during parsing, why not?

why not add jsonc (option 1)

Many workarounds are available, at least for the use case where comments can be ignored.

"JSON with Comments" doesn't have a spec, and although // and /* */ are bog-standard, there remains the annoying fact that VS Code's interpretation of "JSON with Comments" also allows trailing commas. I suspect they realize that was a mistake, which is why they deprecated it. From my standpoint, that can be ignored, but it's an awkward wrinkle to be sure.

Considering jaq's focus on faithful representation of the original data, a mode where it blithely ignores comments feels like an awkward fit. Sure, it would be sick if it could intelligently preserve comments during updates, but that would be tough to get right.

mikefarah's yq might be a better place to add jsonc support, since it already handles multiple formats with --input-format/--output-format.

❯ yq --help
...
  -p, --input-format string           [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|lua|l] parse format for input. (default "auto")
  -o, --output-format string          [auto|a|yaml|y|json|j|props|p|csv|c|tsv|t|xml|x|base64|uri|toml|shell|s|lua|l] output format type. (default "auto")

The most obvious argument against adding jsonc support is (I think) also the weakest argument: it's tempting to say, "jaq doesn't support Hjson or json5 or xml so why would it support jsonc?" IMO, because jsonc is so close to plain JSON that it's relatively easy for tools that operate on JSON to also operate on "JSON with Comments" (at least insofar as the comments can be ignored).

what about other forms of comments

pkoppstein
Since jq and jaq do allow "#" comments in JSON that is presented as (part of) a jq/jaq program

News to me! If that could be hacked to handle //, that might work for me. For my use cases, I could replace block comments with line comments.

pkoppstein
or at least one form of comments

Although other characters like # might work for some use cases, that wouldn't help me, sadly. I need to interface with config files for existing tools, which only support jsonc: // and /* */.

conclusion

Ideally, for me, jaq could be made to ignore // and /* */ comments, thereby having minimal support for jsonc input.

However, 01mf02's decision is also reasonable, and I'm happy to have a clear decision one way or the other. Now people can read this in the future and not go down quite so deep a rabbit hole as I have.

Plus, I have at least one workaround (which mostly works for me), so it's not a blocker.

@01mf02
Copy link
Owner

01mf02 commented Jul 19, 2024

I'm happy that jsmin works for you. I think that there is value in sticking to the most widely recognised definition of what JSON is, and that is the spec. Everything else opens a pandora's box, IMO.

Just for completeness: In the future, I think that the road forward will be for jaq to support YAML. That would allow not only for comments, but also for other syntax that JSON does not offer. jaq 1.5 has removed the roadblocks to support different input formats in jaq, it now "only" remains to integrate a YAML parser with jaq, define the operations on YAML values, and you got YAML support (and with that basically JSON with comments for free).

@01mf02
Copy link
Owner

01mf02 commented Jul 19, 2024

Just to be clear, when I say that jaq supports YAML, that means that it should provide a mode where it reads input data as JSON and another where it reads input data as YAML.

@01mf02
Copy link
Owner

01mf02 commented Jul 19, 2024

I have come across JSON5, which is a fairly popular JSON extension (including comments) with a proper specification.
Furthermore, it comes with a tool to convert JSON5 to JSON, which would also allow you to write comments in JSON:
https://github.com/json5/json5

@denosaurtrain
Copy link
Author

what is JSON? what is jsonc?

I think that there is value in sticking to the most widely recognised definition of what JSON is, and that is the spec. Everything else opens a pandora's box, IMO.

I agree 💯 percent that JSON is JSON as described by the spec.

jsonc is not JSON because it does not conform to the JSON spec, just as YAML is not JSON because it does not conform to the JSON spec. JSON, YAML, and jsonc are three separate formats. This issue is a request to support the format jsonc, not a request to treat JSON itself as something other than JSON.

Another example: TypeScript is a superset of JavaScript, and the two are separate languages. In the same way, jsonc is a superset of JSON, and the two are separate languages.

does YAML support help me?

the road forward will be for jaq to support YAML.

Neat! Maybe someday I can replace yq and jq with jaq.

[YAML] would allow not only for comments, but also for other syntax that JSON does not offer.

Like I said above, unfortunately other formats aren't an option for me. VS Code's configuration files are all jsonc (not JSON), and VS Code doesn't support other file formats like JSON5 or YAML. The story is the same for Deno (which only supports deno.json or deno.jsonc) and ASP.NET's appsettings.json (which only supports jsonc, AFAIK).

Supporting YAML might help some folks who can switch, but not those who are stuck with jsonc (not JSON) and therefore need tools that work with jsonc (not JSON).

how adding YAML relates to adding jsonc

when I say that jaq supports YAML, that means that it should provide a mode where it reads input data as JSON and another where it reads input data as YAML

Cool! That's like how I mentioned yq works. Assuming you use the same flag, then I envision it like this:

--input-format=json (the default)
--input-format=yaml
--input-format=jsonc <- what this issue is about

The proposed option is jsonc (not JSON).

Unlike adding YAML support, which requires a separate code path and/or library, jsonc is so close to JSON that, as an implementation detail, the simplest implementation is a JSON parser with an opt-in flag to ignore comments. I.e., it becomes a parser that supports two formats: either JSON or jsonc.

disclaimer

I hope I don't come across the wrong way here... I'm deeply grateful for your work on jaq, and I 100% respect your decision not to support jsonc as a file format. For some reason I care a lot about clarifying that I do not want to bend the rules of JSON itself: JSON is JSON is JSON. This issue is a request to support jsonc (not JSON), because I need to interface with at least two tools that only support that specific format.

@01mf02
Copy link
Owner

01mf02 commented Jul 31, 2024

Unlike adding YAML support, which requires a separate code path and/or library, jsonc is so close to JSON that, as an implementation detail, the simplest implementation is a JSON parser with an opt-in flag to ignore comments. I.e., it becomes a parser that supports two formats: either JSON or jsonc.

I also thought that jsonc is so close to JSON that it would be simple to have a flag somewhere that regulates whether comments are ignored or errors. Then I looked a bit closer at the implementation of the library that underlies JSON parsing in jaq, hifijson: This line would need to be modified. Unfortunately, due to the architecture of hifijson, in this function, it is not easy to "just" introduce a flag (in a nutshell, because this function is in a trait, and a trait does not have associated data, so we cannot easily store something here). This can be solved, but (a) it would require quite a bit of new code, and (b) it might have a negative performance impact on JSON parsing without comments. That's why I'm not so keen on integrating optional JSONC support into hifijson, especially when this use case can be addressed either with tools like jsmin or once jaq supports YAML. I hope you understand.

@denosaurtrain
Copy link
Author

I hope you understand.

Absolutely! You've been extraordinarily patient with my long, pedantic posts.

...especially when this use case can be addressed either with tools like jsmin or once jaq supports YAML.

Indeed. YAML won't help me because I'm forced to use jsonc, but with jsmin I can do things like this:

# list all recommended VS Code extensions that are not currently installed
./jsmin < ~/code/default.code-workspace \
    | jaq -r '.extensions.recommendations[]' \
    | rg -v $(code --list-extensions \
        | tail --lines=+2 \
        | xargs -I {} echo -en "({})|" \
        | head --bytes -1)

@01mf02
Copy link
Owner

01mf02 commented Aug 2, 2024

Absolutely! You've been extraordinarily patient with my long, pedantic posts.

It's my pleasure. I think it is important to explain the reasoning behind decisions to gain acceptance for them.

Indeed. YAML won't help me because I'm forced to use jsonc, but with jsmin I can do things like this:

Ah, I was under the misconception that YAML supported block comments --- in that case, you could just have interpreted your JSONC file as YAML and everything would have worked. Too bad that this is not the case. :(

Well, at least I'm happy that your jsmin workflow now works. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants