-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve alternative JSON support #171
Comments
This article is from 2015 but discusses some alternative libraries. I also investigated newer alternatives (simdjson, hyperjson) but found none that were stable, simple to install, and matched the The results below suggest to me that (note: In my understanding, Resultsloads (large obj)json 0.354s dumps (large obj)json 0.335s loads (small objs)json 0.449s dumps (small objs)json 0.624s |
Looks like |
You should also check out orjson depending on use case. |
@dgasmith Good call! Here is an updated plot. I recommend that we choose the optimal case for the "small" data, which has a size range that is similar to most |
Another project to watch with respect to this issue: https://github.com/lemire/simdjson |
My benchmark code doesn't work for simdjson. I'd prefer to stick to packages that explicitly implement all of the features in Other possible compatibility issues to watch out for / test carefully:Numerical precision>>> import math, json, ujson
>>> json.dumps(math.pi)
'3.141592653589793'
>>> ujson.dumps(math.pi)
'3.1415926536' (This is configurable.) Non-ASCII Characters>>> import json, orjson
>>> json.dumps('好')
'"\\u597d"'
>>> orjson.dumps('好')
b'"\xe5\xa5\xbd"' Datetime objects>>> import datetime, json, orjson
>>> json.dumps(datetime.datetime.now())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/bdice/miniconda3/envs/dice/lib/python3.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Users/bdice/miniconda3/envs/dice/lib/python3.7/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Users/bdice/miniconda3/envs/dice/lib/python3.7/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Users/bdice/miniconda3/envs/dice/lib/python3.7/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type datetime is not JSON serializable
>>> orjson.dumps(datetime.datetime.now())
b'"2019-07-26T10:29:09.710778"' The orjson README also carefully documents other issues around 53-bit vs. 64-bit integers. It notes that for floating-point numbers, "ujson is inaccurate in both serialization and deserialization." |
Oh, that is good to know! The built-in JSON package will serialize floating point numbers to whatever precision necessary for an accurate representation. We should probably add unit tests for all of these types before enabling or switching to any other back-end. |
Depending on the structure of your data you can also consider msgpack. Very useful for array structures. |
Here are my thoughts on the steps to proceed with this issue, and the relevant concepts
|
@vyasr Thanks for documenting this so clearly! Another solution would be to remove "alternative" JSON support and close this issue until/unless users request it. Resolving this issue requires extreme attention to detail. I believe that compatibility across platforms and locales is of critical importance, and the performance benefits are probably not important enough to most signac users. From some of the documented examples above, I believe that many "alternative" JSON libraries have quirks that prevent me from fully trusting their compatibility in serializing and deserializing floating point values, unicode characters, and special objects like datetimes. Resolving this in the way that @vyasr suggests is a good path forward if we decide it is worth doing, but at this time I do not think supporting alternative JSON backends is worth the significant effort that would be demanded in the form of testing, handling potential incompatibilities, and long-term maintenance. I vote to deprecate rapidjson support and eventually remove our current implementation. If at some point we want to re-visit this, we can add a configurable option for JSON backends as @vyasr suggests, subject to appropriate scrutiny and testing. |
I also vote to close this issue for now and deprecate support for rapidjson. The only thing I could see as a reasonable project is to slightly refactor the code before the release of signac 2.0, so that it would be easier to implement that in some future version should there be a demand for it. |
I think there is some consensus on deprecating rapidjson -- I opened #285 to help make concrete steps towards deprecation. I will close this issue for now. |
Feature description
I'm opening this issue to track our discussion about using JSON libraries other than the built-in JSON library. Currently, signac supports rapidjson internally, but we should investigate carefully to determine which JSON libraries are the best alternatives.
Proposed solution
As a first step, we would want to add benchmarks for the existing support so that we can assess further changes.
Additional context
Any alternative solutions you might have considered or other information that might be helpful to understand this feature request.
The text was updated successfully, but these errors were encountered: