Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix json indent #2546

Merged
merged 17 commits into from
Jan 8, 2025
Merged

Fix json indent #2546

merged 17 commits into from
Jan 8, 2025

Conversation

will-moore
Copy link
Contributor

This fixes the default json_indent added in #1952.
It was getting ignored in the V3JsonEncoder because kwargs has "indent": None so that kwargs.pop("indent", config.get("json_indent")) always returns None.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/tutorial.rst
  • Changes documented in docs/release.rst
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

self.indent = 2

# expected has extra ' ' on each line compared with json.dumps( indent=2)
expected = json.dumps(json.loads(d), cls=TestIndentEncoder).encode()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.dumps takes an indent keyword argument, so I don't think we need the custom encoder class for this test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried that first!
But json.dumps(data, indent=2) gives a different output than a class derived from json.JSONEncoder. The json.JSONEncoder has an extra whitespace at the end of each line (see my comment on line 315 above).
I don't know why this is (I guess it would be nice to know)?!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, i would also like to know why this is the case. is the extra whitespace more or less correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirmed this doing a test writing of some zarr data. All the zarr.json have whitespace at the end of rows that end with a comma.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, i would also like to know why this is the case. is the extra whitespace more or less correct?

I would say it's not really correct to have the extra whitespace at the end of lines.
Here, there shouldn't be whitespace after "foo",

>>> json.dumps({"test": "foo", "bar": 2}, cls=TestIndentEncoder)
'{\n  "test": "foo", \n  "bar": 2\n}'

@will-moore
Copy link
Contributor Author

There are 4 test failures just now:

FAILED tests/test_array.py::test_nbytes_stored - assert 502 == 366
FAILED tests/test_array.py::test_nbytes_stored_async - assert 502 == 366
FAILED tests/test_array.py::TestInfo::test_info_complete - AssertionError: assert Type
FAILED tests/test_array.py::TestInfo::test_info_complete_async - AssertionError: assert Type

I wouldn't expect those to be caused by this PR? Are they passing elsewhere?

@jhamman
Copy link
Member

jhamman commented Dec 12, 2024

Looking at the failing tests, I do think these are related to this PR:

    async def test_nbytes_stored_async() -> None:
        arr = await zarr.api.asynchronous.create(shape=(100,), chunks=(10,), dtype="i4")
        result = await arr.nbytes_stored()
>       assert result == 366  # the size of the metadata document. This is a fragile test.
E       assert 502 == 366

So it seems the change to the JSON encoding is impacting the size of the metadata document. I agree with the comment in the test "this is a fragile test". I think we should be open to changing this test or set the json encoding for these specific tests.

@d-v-b
Copy link
Contributor

d-v-b commented Dec 12, 2024

I think the info test should not be hard-coding the metadata size but rather comparing the reported metadata size to the actual metadata size, that would make the test much less brittle.

@d-v-b
Copy link
Contributor

d-v-b commented Dec 12, 2024

similarly, test_nbytes_stored should test that the actual size of the metadata document (measured directly) matches what comes out of nbytes_stored(), instead of hard-coding things. I can clean these up in a PR unless someone beats me to it.

@will-moore
Copy link
Contributor Author

@d-v-b I wasn't sure how to implement your suggestion from #2546 (comment) - Maybe you could give some more pointers?
But for now I've updated the byte counts so that the tests are green (but still fragile as before).

@d-v-b
Copy link
Contributor

d-v-b commented Dec 17, 2024

@d-v-b I wasn't sure how to implement your suggestion from #2546 (comment) - Maybe you could give some more pointers? But for now I've updated the byte counts so that the tests are green (but still fragile as before).

I was thinking that instead of comparing the reported size of the metadata document to a hard-coded value, we could just get the size of the metadata document directly, and ensure that the reported size matches the observed size. We would get the actual size of the metadata document by invoking store.get(zarr.json) (for v3) and measuring the length of the bytes coming out. A similar process would give you the size of the metadata document for v2. But this can be done in a separate PR.

@will-moore
Copy link
Contributor Author

@d-v-b Anything else I need to do here?

@dstansby dstansby enabled auto-merge (squash) January 7, 2025 09:40
@dstansby
Copy link
Contributor

dstansby commented Jan 8, 2025

I fixed the docstests in the latest commit - 🤞 this works, and thanks for the PR!

@dstansby dstansby merged commit eb25424 into zarr-developers:main Jan 8, 2025
28 checks passed
@will-moore
Copy link
Contributor Author

Great, thanks @dstansby

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants