Revise MemoryParserState #235

TotalVerb · 2018-01-29T07:13:26Z

MemoryParserState needs a big revision as it has aged over the years. I merged #234 in the meantime as the performance has not been benchmarked in a while and it is probably not worth the extra effort to continue to maintain the current code. I expect to have some time this week to take on this project, and so far have the following ideas in mind, alongside the Julia v0.6 and v0.7 modernization:

Instead of using Vector{UInt8}, String can be used directly in many situations.
Instead of totally falling back on the slow parsing if any escapes are found, we should copy as much as we can directly, and then append the escape, and continue. This should help with strings with few escapes and long strings of memcpy-able memory.
Audit and profile to see what is actually in need of improvement (this has not been done in a while).
Hopefully simplify the code, overall.
Hopefully also address some of the issues like JSON.parse fails for cweb/url-testing/master/urls.json #232.

The text was updated successfully, but these errors were encountered:

ScottPJones · 2018-01-30T13:45:43Z

As far as the fallback on slow parsing, I think instead, the approach of calculating the correct length first (which can be very fast, simply scan for \, sum up how many bytes less will be needed in the output, and then return that.
If that value is 0, (i.e. no escape sequences), then you can simply do a memcpy. You might also return a tuple, with a count of the number of escape sequences, the number of bytes fewer to output, and the offset of the first escape sequence found. That will allow optimizing for the case where an escape sequence is near the beginning or near the end (because once the last escape sequence is processed, then the rest of the string can be copied with memcpy).
If you'd like, I could write a PR for that performance enhancement, if you want to take care of the changes such as using String, profiling, etc.

stevengj · 2018-02-01T16:59:19Z

See #236, which addresses several of these points, although further optimization is of course possible.

samoconnor · 2018-02-04T04:46:02Z

FYI, I've been experimenting with a lazy JSON parser: https://github.com/samoconnor/jsonhack

[Update: a simpler and lazyer variation LazyJSON.jl improves speed of access near the end of a large file. See benchmark output below. This variant uses iterators that work directly with the JSON bytes and represents JSON values using just the byte index of the start of the value. This approach uses 0.001% of the memory used by JSON.jl ]

test/benchmark.jl uses a 1MB AWS API definition JSON file to compare performance vs JSON.jl. When accessing a value close to the start of the file the lazy parser is ~2000 times faster than JSON.jl, for 2 values near then end of the file, the lazyer parser is ~6 times faster and allocates 0.07MiB vs 913 MiB.

Access value close to start:
LazyJSON.jl:  0.000558 seconds (3.23 k allocations: 121.719 KiB)
JSON.jl:      1.025800 seconds (9.34 M allocations: 913.682 MiB, 12.53% gc time)


Access 2 values close to end:
LazyJSON.jl:  0.162446 seconds (2.66 k allocations: 71.250 KiB)
JSON.jl:      1.019581 seconds (9.34 M allocations: 913.682 MiB, 12.22% gc time)

test/runtests.jl uses https://github.com/nst/JSONTestSuite to check parser corner case behaviour.

ScottPJones · 2018-02-04T14:54:55Z

I wasn't aware of the test suite, that's great. I found some bugs (which I've fixed in my parsing rewrite PR that I'm reading in), I'm sure that test suite would have caught them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise MemoryParserState #235

Revise MemoryParserState #235

TotalVerb commented Jan 29, 2018

ScottPJones commented Jan 30, 2018

stevengj commented Feb 1, 2018

samoconnor commented Feb 4, 2018 •

edited

Loading

ScottPJones commented Feb 4, 2018

Revise MemoryParserState #235

Revise MemoryParserState #235

Comments

TotalVerb commented Jan 29, 2018

ScottPJones commented Jan 30, 2018

stevengj commented Feb 1, 2018

samoconnor commented Feb 4, 2018 • edited Loading

ScottPJones commented Feb 4, 2018

samoconnor commented Feb 4, 2018 •

edited

Loading