Use faster UTF8 encoding in `Content.write()` #12475

lorban · 2024-11-04T16:07:51Z

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

tests/jetty-jmh/src/main/java/org/eclipse/jetty/io/jmh/Utf8Benchmark.java

jetty-core/jetty-io/src/main/java/org/eclipse/jetty/io/Content.java

joakime

Please confirm that the behavior when dealing with bad input is the same with the new code vs the old code.

Example:
A bad / Malformed input String that would trigger a fault in the conversion.
If the old code throws an exception, this new one should too.
If the old code used replacement characters, this new one should too.

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban · 2024-11-05T10:09:42Z

@joakime is it possible to create a String object containing invalid UTF8? All I've seen in our tests is to create invalid UTF8 in byte arrays then the String constructor is used to build the string, which does apply some correction. Did I get that right?

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

joakime

It is difficult to create a String that CharsetDecoder.encode(String) would fail on.

lorban · 2024-11-05T16:12:56Z

I found one other place where we use encode that could be replaced with getBytes/wrap.

lorban · 2024-11-05T16:18:03Z

For future reference, here is the benchmark's report:

Benchmark                                          (locale)   Mode  Cnt         Score         Error   Units
Utf8Benchmark.testEncode                              ASCII  thrpt   10   1885900.209 ±   12517.384   ops/s
Utf8Benchmark.testEncode:gc.alloc.rate                ASCII  thrpt   10      1121.696 ±       7.530  MB/sec
Utf8Benchmark.testEncode:gc.alloc.rate.norm           ASCII  thrpt   10       624.007 ±       0.001    B/op
Utf8Benchmark.testEncode:gc.count                     ASCII  thrpt   10        19.000                counts
Utf8Benchmark.testEncode:gc.time                      ASCII  thrpt   10        23.000                    ms
Utf8Benchmark.testEncode                                 FR  thrpt   10   1310399.805 ±   12798.866   ops/s
Utf8Benchmark.testEncode:gc.alloc.rate                   FR  thrpt   10       789.489 ±       7.739  MB/sec
Utf8Benchmark.testEncode:gc.alloc.rate.norm              FR  thrpt   10       632.011 ±       0.001    B/op
Utf8Benchmark.testEncode:gc.count                        FR  thrpt   10        14.000                counts
Utf8Benchmark.testEncode:gc.time                         FR  thrpt   10        18.000                    ms
Utf8Benchmark.testEncode                                 JA  thrpt   10    814449.918 ±   11152.653   ops/s
Utf8Benchmark.testEncode:gc.alloc.rate                   JA  thrpt   10      2925.414 ±      40.086  MB/sec
Utf8Benchmark.testEncode:gc.alloc.rate.norm              JA  thrpt   10      3768.017 ±       0.001    B/op
Utf8Benchmark.testEncode:gc.count                        JA  thrpt   10        33.000                counts
Utf8Benchmark.testEncode:gc.time                         JA  thrpt   10        47.000                    ms
Utf8Benchmark.testWrapGetBytes                        ASCII  thrpt   10  39417563.752 ± 1256275.047   ops/s
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate          ASCII  thrpt   10     19538.322 ±     623.689  MB/sec
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate.norm     ASCII  thrpt   10       520.000 ±       0.001    B/op
Utf8Benchmark.testWrapGetBytes:gc.count               ASCII  thrpt   10        71.000                counts
Utf8Benchmark.testWrapGetBytes:gc.time                ASCII  thrpt   10       144.000                    ms
Utf8Benchmark.testWrapGetBytes                           FR  thrpt   10   3434889.274 ±   64716.469   ops/s
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate             FR  thrpt   10      4819.736 ±      90.934  MB/sec
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate.norm        FR  thrpt   10      1472.004 ±       0.001    B/op
Utf8Benchmark.testWrapGetBytes:gc.count                  FR  thrpt   10        37.000                counts
Utf8Benchmark.testWrapGetBytes:gc.time                   FR  thrpt   10        58.000                    ms
Utf8Benchmark.testWrapGetBytes                           JA  thrpt   10   1399081.733 ±   47082.158   ops/s
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate             JA  thrpt   10      3595.402 ±     121.188  MB/sec
Utf8Benchmark.testWrapGetBytes:gc.alloc.rate.norm        JA  thrpt   10      2696.010 ±       0.001    B/op
Utf8Benchmark.testWrapGetBytes:gc.count                  JA  thrpt   10        32.000                counts
Utf8Benchmark.testWrapGetBytes:gc.time                   JA  thrpt   10        46.000                    ms

gregw

Is there anything that can be done to improve org.eclipse.jetty.ee11.servlet.HttpOutput#print(java.lang.String, boolean)?

lorban · 2024-11-06T12:50:05Z

@gregw HttpOutput.print() goes into great length to pool the encoder and to to detect encoding errors like overflows/underflows.

We could theoretically replace all that with a much simpler String.getBytes(Charset), which could improve perf but may not work as expected w.r.t encoding. @joakime what's your opinion on that one?

joakime · 2024-11-06T15:23:39Z

@joakime what's your opinion on that one?

If the behavior of the API to the users is maintained, then I'm in favor of the change.
Is it possible to use HttpOutput.print() with partial code points? (meaning a print() is called which starts the code points, then a subsequent print() results in finishing the code point)
If so, then the String.getBytes(Charset) wouldn't work for us.

lorban · 2024-11-06T16:17:26Z

I'm going to give HttpOutput.print() a try in another PR, as it isn't trivial to change but may work and be worth the effort.

#12469 - use faster UTF8 encoding

7948afe

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban added the Enhancement label Nov 4, 2024

lorban self-assigned this Nov 4, 2024

lorban requested review from sbordet and gregw November 4, 2024 16:08

sbordet requested changes Nov 4, 2024

View reviewed changes

tests/jetty-jmh/src/main/java/org/eclipse/jetty/io/jmh/Utf8Benchmark.java Outdated Show resolved Hide resolved

jetty-core/jetty-io/src/main/java/org/eclipse/jetty/io/Content.java Show resolved Hide resolved

joakime requested changes Nov 4, 2024

View reviewed changes

#12469 - make benchmark single-threaded and its report more readable

bfd384a

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

sbordet previously approved these changes Nov 5, 2024

View reviewed changes

#12469 - use faster UTF8 encoding

316065a

Signed-off-by: Ludovic Orban <lorban@bitronix.be>

lorban dismissed sbordet’s stale review via 316065a November 5, 2024 16:10

lorban requested review from sbordet and joakime November 5, 2024 16:12

joakime approved these changes Nov 5, 2024

View reviewed changes

gregw approved these changes Nov 5, 2024

View reviewed changes

lorban merged commit 66b494d into jetty-12.0.x Nov 6, 2024
10 checks passed

lorban deleted the fix/jetty-12.0.x/12469-faster-utf8-write branch November 6, 2024 16:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use faster UTF8 encoding in `Content.write()` #12475

Use faster UTF8 encoding in `Content.write()` #12475

lorban commented Nov 4, 2024

joakime left a comment

lorban commented Nov 5, 2024

joakime left a comment

lorban commented Nov 5, 2024

lorban commented Nov 5, 2024

gregw left a comment

lorban commented Nov 6, 2024

joakime commented Nov 6, 2024 •

edited

Loading

lorban commented Nov 6, 2024

Use faster UTF8 encoding in Content.write() #12475

Use faster UTF8 encoding in Content.write() #12475

Conversation

lorban commented Nov 4, 2024

joakime left a comment

Choose a reason for hiding this comment

lorban commented Nov 5, 2024

joakime left a comment

Choose a reason for hiding this comment

lorban commented Nov 5, 2024

lorban commented Nov 5, 2024

gregw left a comment

Choose a reason for hiding this comment

lorban commented Nov 6, 2024

joakime commented Nov 6, 2024 • edited Loading

lorban commented Nov 6, 2024

Use faster UTF8 encoding in `Content.write()` #12475

Use faster UTF8 encoding in `Content.write()` #12475

joakime commented Nov 6, 2024 •

edited

Loading