Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The JSON library is very slow #6218

Open
konnectr opened this issue Jul 21, 2023 · 2 comments
Open

The JSON library is very slow #6218

konnectr opened this issue Jul 21, 2023 · 2 comments

Comments

@konnectr
Copy link
Collaborator

konnectr commented Jul 21, 2023

Problem:

import time
import json
import orjson
import ujson
import simplejson

def benchmark(name, dumps, loads):
    start = time.time()
    for i in range(3000000):
        result = dumps(m)
        loads(result)
    print(name, time.time() - start)

if __name__ == "__main__":
    m = {
        "timestamp": 1556283673.1523004,
        "task_uuid": "0ed1a1c3-050c-4fb9-9426-a7e72d0acfc7",
        "task_level": [1, 2, 1],
        "action_status": "started",
        "action_type": "main",
        "key": "value",
        "another_key": 123,
        "and_another": ["a", "b"],
    }

    benchmark("Python", json.dumps, json.loads)
    benchmark("ujson", ujson.dumps, ujson.loads)

    # orjson only outputs bytes, but often we need unicode:
    benchmark("orjson", lambda s: str(orjson.dumps(s), "utf-8"), orjson.loads)

    benchmark('simplejson', simplejson.dumps, simplejson.loads)

Results:

Python 11.381623983383179
ujson 5.0240020751953125
orjson 2.138978958129883
simplejson 15.721292972564697

Redash uses simplejson, which is very slow
Solution:
Use another library to serialize and deserialize json

@justinclift
Copy link
Member

As long as it doesn't break things for our existing users, then it sounds like a decent idea. 😄

@arikfr
Copy link
Member

arikfr commented Jul 23, 2023

Replacing the library should be relatively easy as in most places we use our own wrapper (utils.json_dumps /utils.json_loads), so we need to update in a single place.

I remember that simplejson supports the ignore_nan option which I think the built in one didn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants