Add a fast path for ASCII strings #620

casperisfine · 2024-10-17T14:01:44Z

This optimization is based on a few assumptions:

Most strings are ASCII only.
Most strings had their coderange scanned already.

If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings.

Before:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   140.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.464k (± 1.4%) i/s  (682.83 μs/i) -      7.420k in   5.067573s
                  oj      2.338k (± 1.5%) i/s  (427.64 μs/i) -     11.730k in   5.017336s
           rapidjson      1.075k (± 1.6%) i/s  (930.40 μs/i) -      5.400k in   5.025469s

Comparison:
                json:     1464.5 i/s
                  oj:     2338.4 i/s - 1.60x  faster
           rapidjson:     1074.8 i/s - 1.36x  slower

After:

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   189.000 i/100ms
                  oj   228.000 i/100ms
           rapidjson   108.000 i/100ms
Calculating -------------------------------------
                json      1.903k (± 1.2%) i/s  (525.55 μs/i) -      9.639k in   5.066521s
                  oj      2.306k (± 1.3%) i/s  (433.71 μs/i) -     11.628k in   5.044096s
           rapidjson      1.069k (± 2.4%) i/s  (935.38 μs/i) -      5.400k in   5.053794s

Comparison:
                json:     1902.8 i/s
                  oj:     2305.7 i/s - 1.21x  faster
           rapidjson:     1069.1 i/s - 1.78x  slower

This optimization is based on a few assumptions: - Most strings are ASCII only. - Most strings had their coderange scanned already. If the above is true, then by checking the string coderange, we can use a much more streamlined function to encode ASCII strings. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 140.000 i/100ms oj 230.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.464k (± 1.4%) i/s (682.83 μs/i) - 7.420k in 5.067573s oj 2.338k (± 1.5%) i/s (427.64 μs/i) - 11.730k in 5.017336s rapidjson 1.075k (± 1.6%) i/s (930.40 μs/i) - 5.400k in 5.025469s Comparison: json: 1464.5 i/s oj: 2338.4 i/s - 1.60x faster rapidjson: 1074.8 i/s - 1.36x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ```

casperisfine · 2024-10-17T14:02:26Z

ext/json/ext/generator/generator.c

+            break;
+        case ENC_CODERANGE_VALID:
+            if (RB_UNLIKELY(state->ascii_only)) {
+                convert_UTF8_to_ASCII_only_JSON(buffer, obj, state->script_safe);


After reflection, the split that might have the more impact is script_safe, as it would also speedup the ASCII path.

casperisfine · 2024-10-17T15:12:46Z

I pushed an extra commit that re-use the escape table from a81ec47

== Encoding twitter.json (466906 bytes)
ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23]
Warming up --------------------------------------
                json   224.000 i/100ms
                  oj   230.000 i/100ms
           rapidjson   107.000 i/100ms
Calculating -------------------------------------
                json      2.254k (± 1.6%) i/s  (443.69 μs/i) -     11.424k in   5.069999s
                  oj      2.318k (± 1.4%) i/s  (431.32 μs/i) -     11.730k in   5.060421s
           rapidjson      1.081k (± 1.9%) i/s  (925.05 μs/i) -      5.457k in   5.049738s

Comparison:
                json:     2253.8 i/s
                  oj:     2318.5 i/s - same-ish: difference falls within error
           rapidjson:     1081.0 i/s - 2.08x  slower

For now it's only applied to the ASCII fast path, but I think we could optimize UTF8 too with the assumption that most bytes in an UTF8 string are ASCII. But I'd rather not complexify this PR more.

This performs noticeably better than the boolean logic. Before: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 189.000 i/100ms oj 228.000 i/100ms rapidjson 108.000 i/100ms Calculating ------------------------------------- json 1.903k (± 1.2%) i/s (525.55 μs/i) - 9.639k in 5.066521s oj 2.306k (± 1.3%) i/s (433.71 μs/i) - 11.628k in 5.044096s rapidjson 1.069k (± 2.4%) i/s (935.38 μs/i) - 5.400k in 5.053794s Comparison: json: 1902.8 i/s oj: 2305.7 i/s - 1.21x faster rapidjson: 1069.1 i/s - 1.78x slower ``` After: ``` == Encoding twitter.json (466906 bytes) ruby 3.4.0preview2 (2024-10-07 master 32c733f57b) +YJIT +PRISM [arm64-darwin23] Warming up -------------------------------------- json 224.000 i/100ms oj 230.000 i/100ms rapidjson 107.000 i/100ms Calculating ------------------------------------- json 2.254k (± 1.6%) i/s (443.69 μs/i) - 11.424k in 5.069999s oj 2.318k (± 1.4%) i/s (431.32 μs/i) - 11.730k in 5.060421s rapidjson 1.081k (± 1.9%) i/s (925.05 μs/i) - 5.457k in 5.049738s Comparison: json: 2253.8 i/s oj: 2318.5 i/s - same-ish: difference falls within error rapidjson: 1081.0 i/s - 2.08x slower ``` The escape table is taken directly from Mame's PR. Co-Authored-By: Yusuke Endoh <mame@ruby-lang.org>

nvasilevski · 2025-01-06T15:13:51Z

ext/json/ext/generator/generator.c

+            }
+
+            beg = pos + 1;
+            switch (ch) {


Thanks for elaborating on this change in your blog post ❤️

I was wondering if we could entirely eliminate this switch at an expense of static memory by storing escaped values instead of booleans in the lookup table. Just as an example, in ruby:

JSON_ESCAPE_TABLE['"'] = "\\\" JSON_ESCAPE_TABLE['\\'] = "\\\\" JSON_ESCAPE_TABLE['\b'] = "\\b" # ... # default: case can be pre-calculated as well for other characters

It may not result in a visible improvement in realistic benchmarks due to having not that many chars to escape but could be visible on a micro-benchmark specifically crafted to measure escaping performance at no cost (except static memory) for the main flow.

I could be missing something that makes it hard to store non-boolean values in escape_table but would love to at least learn about this as well. Thanks!

It could be worth trying, but there's a couple things making this not an obvious win.

Escaped values in this case aren't scalar. You can either store them as a table of char *:

static const char *[256] escape_table = { 0, 0, 0, "\\n", // .... };

If you do this:

char * is 8B on 64-bit platform, whereas char is just 1B, so we went from 256B to 2kiB, not nothing.

char * doesn't store the length, so presumably to append these you'd need to call strlen, they are shot string so probably not very slow, but given we're just trying to eliminate a case, it may actually be slower.

char * isn't stored contiguiously, so that's another memory region to keep in CPU cache.

Alternatively, you could store the escaped strings inline, e.g.:

struct escaped_char { unsigned char len; char bytes[8]; } static const struct escaped_char[256] escape_table = { {0, 0}, {0, 0}, {2, "\\n"}, // .... }

But this would make the escape table even bigger.

Maybe you could only do this for strings that are known to be pure-ASCII so the table only has 128 entries, keeping the memory usage down, but then you can't really re-use it for UTF-8.

Note that all of these are just "static assumption", performance isn't always easy to predict, so maybe your suggestion would actually help.

headius · 2025-01-08T18:21:24Z

Did this get applied to the Java parser anywhere? I can take it if not but I don't want to dive in and waste my time if it's already there.

byroot · 2025-01-08T20:02:38Z

As mentioned previously, I don't know enough about Java to do much to it, and I don't really have any incentive either.

If you want to speedup the Java version feel free to do it.

headius · 2025-01-08T21:53:46Z

@byroot Ok thanks for clarifying. You've done so many PRs lately I lost track!

I'll do the port and push a PR. The code in the Java extension will be pretty much the same as what you wrote for the C version.

headius · 2025-01-08T22:55:00Z

Good news: much of the structure of the Java code already mimics the C code, and a profile of the twitter.json file dumping shows most of the overhead is in two places: handling each character (because it's using heavier UTF-8 logic always) and picking the right logic for each object type (bad lookup of handler logic based on the Ruby class, most likely).

Ought to be able to speed this up nicely.

@byroot

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

@byroot

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

headius · 2025-01-09T08:46:28Z

Work is proceeding on this and other optimizations for the JRuby extension in #725.

@byroot

* Make benchmark runnable without oj available * Port convert_UTF8_to_ASCII_only_JSON to Java This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See #620 * Align string generate method with generate_json_string * Port convert_UTF8_to_JSON from C Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See #620 * Use external iteration to reduce alloc Lots of surrounding state so just take the hit of a Set and Iterator rather than a big visitor object. * Remove unused imports * Inline ConvertBytes logic for long to byte[] This change duplicates some code from JRuby to allow rendering the fixnum value to a shared byte array rather than allocating new for each value. Since fixnum dumping is a leaf operation, only one is needed per session. * Eliminate * import * Restructure handlers for easier profiling Anonymous classes show up as unnamed, numbered classes in profiles which makes them difficult to read. * Avoid allocation when writing Array delimiters Rather than allocating a buffer to hold N copies of arrayNL, just write it N times. We're buffering into a stream anyway. This makes array dumping zero-alloc other than buffer growth. * Move away from Handler abstraction Since there's a fixed number of types we have special dumping logic for, this abstraction just introduces overhead we don't need. This patch starts moving away from indirecting all dumps through the Handler abstraction and directly generating from the type switch. This also aligns better with the main loop of the C code and should inline and optimize better. * Match C version of fbuffer_append_long * Minor tweaks to reduce complexity * Reimpl byte[] stream without synchronization The byte[] output stream used here extended ByteArrayOutputStream from the JDK, which sychronizes all mutation operations (like writes). Since this is only going to be used once within a given call stack, it needs no synchronization. This change more than triples the performance of a benchmark of dumping an array of empty arrays and should increase performance of all dump forms. * Reduce overhead in repeats * Return incoming array if only one repeat is needed and array is exact size. * Only retrieve ByteList fields once for repeat writes. * Use equivalent of rb_sym2str * Microoptimizations for ByteList stream * Cast to byte not necessary * Refactor this for better inlining * More tiny tweaks to reduce overhead of generateString * Refactor to avoid repeated boolean checks * Eliminate memory accesses for digits The math is much faster here than array access, due to bounds checking and pointer dereferencing. * Loosen visibility to avoid accessor methods Java will generated accessor methods for private fields, burning some inlining budget. * Modify parser bench to work without oj or rapidjson

casperisfine commented Oct 17, 2024

View reviewed changes

casperisfine force-pushed the optimize-ascii-strings branch from 9a00965 to e2a2a33 Compare October 17, 2024 15:15

byroot merged commit 7106b03 into ruby:master Oct 17, 2024
73 checks passed

byroot mentioned this pull request Oct 17, 2024

Make JSON.generate 1.75x as fast #562

Closed

casperisfine deleted the optimize-ascii-strings branch October 17, 2024 15:28

nvasilevski reviewed Jan 6, 2025

View reviewed changes

headius added a commit to headius/json that referenced this pull request Jan 9, 2025

Port convert_UTF8_to_ASCII_only_JSON to Java

be45c9a

This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See ruby#620

headius added a commit to headius/json that referenced this pull request Jan 9, 2025

Port convert_UTF8_to_JSON from C

38c7831

Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See ruby#620

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a fast path for ASCII strings #620

Add a fast path for ASCII strings #620

casperisfine commented Oct 17, 2024 •

edited

Loading

casperisfine Oct 17, 2024

casperisfine commented Oct 17, 2024

nvasilevski Jan 6, 2025

byroot Jan 7, 2025

headius commented Jan 8, 2025

byroot commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 9, 2025

Add a fast path for ASCII strings #620

Add a fast path for ASCII strings #620

Conversation

casperisfine commented Oct 17, 2024 • edited Loading

casperisfine Oct 17, 2024

Choose a reason for hiding this comment

casperisfine commented Oct 17, 2024

nvasilevski Jan 6, 2025

Choose a reason for hiding this comment

byroot Jan 7, 2025

Choose a reason for hiding this comment

headius commented Jan 8, 2025

byroot commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 8, 2025

headius commented Jan 9, 2025

casperisfine commented Oct 17, 2024 •

edited

Loading