Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved HTTP 1.1 ClassRoutingHandler::handle due to un-optimized headers names lookup #25834

Closed
wants to merge 1 commit into from

Conversation

franz1981
Copy link
Contributor

No description provided.

@franz1981
Copy link
Contributor Author

Flamegraphs shows quite an amount of CPU time spent on
image

because this http 1.1 lookup on ClassRoutingHandler is not using the right cached http headers

@franz1981
Copy link
Contributor Author

This is going to add a hashcode computation for all the other cases in order to perform the string-based switch

Copy link
Contributor

@geoand geoand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one!

@franz1981
Copy link
Contributor Author

franz1981 commented May 29, 2022

@geoand I'm taking the point on the Julien V comment about "Techempower dev driver model" and I'm trying hard to NOT push optimizations that are worthless outside of what I see in the flamegraphs -
I've added the same optimization for the single header name lookup (given that's being called with the 2 constant http headers shown there).

I see, as well, that in order to make this more generic, would be nice to have spi methods that's using some public enum of known Http 1.1 and Http 2 header names (with their String-based formatted name) eg

//  org.jboss.resteasy.reactive.server.spi.ServerHttpRequest
String getRequestHeader(HttpHeader httpHeader);

// :P GC-free: int getAllRequestHeaders(HttpHeader httpHeader, Collection<String> matchingHeaders);
List<String> getAllRequestHeaders(HttpHeader httpHeader);

This allows (for http 1.1) to easily (and as quickly as an array look-up) map with the related vert-x cached & optimized http header name constants.

If modifying ServerHttpRequest is not an option (nor supporting the enum list of "known" http 1.1/r header names) I don't know how to make this optimization simple and a more elegant then what I've done, but I'm opened to suggestions :)

@geoand
Copy link
Contributor

geoand commented May 29, 2022

We can certainly modify ServerHttpRequest, we free to make the change and I'll review

@franz1981
Copy link
Contributor Author

franz1981 commented Jul 5, 2022

Didn't forgotten about it :) Just come back from PTO, will keep on working on this the next weeks ;)
I would like to save a giant switch to translate between string -> ascii string in order to save the (very expensive) netty ascii hash code (why expensive? because is not case-insensitive) to happen for each get/add/set, because if the string parameter is not pooled (or is not a literal) it means its hash code (String::hashCode, a bit faster, in theory) should happen every time, in order to be used in the string-based switch

@gsmet
Copy link
Member

gsmet commented Feb 2, 2023

@franz1981 what's the status of this work?

@gsmet gsmet added the triage/needs-feedback We are waiting for feedback. label Feb 2, 2023
@franz1981
Copy link
Contributor Author

Thanks for checking @gsmet , Netty has just merged (and Vertx too) some important changes for http and I wish to recheck if these changes makes sense and/or are relevant
While back from sick leave I will unpark this again

@gsmet gsmet added the triage/needs-rebase This PR needs to be rebased first because it has merge conflicts label Feb 4, 2023
@franz1981 franz1981 force-pushed the optimized_http branch 3 times, most recently from c45bc23 to e3bb61d Compare July 22, 2023 21:09
@franz1981
Copy link
Contributor Author

franz1981 commented Jul 24, 2023

@geoand Adding more data points and asking for an advice here...

Advice first:


seems to assume that ServerHttpResponse is a vertxResponse, according to its name.
It is correct? I see ServerHttpResponse which can be VertxResteasyReactiveRequestContext, QuarkusResteasyReactiveRequestContext or ServletRequestContext: if it can just be of type QuarkusResteasyReactiveRequestContext maybe worth forcing CONTENT_TYPE to be the vertx HTTP 1.1 HttpHeaders.CONTENT_TYPE AsciiString variant and save checking it on VertxResteasyReactiveRequestContext?

Than, a more detailed explanation of what the problem we're solving here...

Netty, under the hood, try hard to avoid wasteless (and expensive) computations of case-insensitive hash codes for the Http parameters names: for this reason, Netty prefer AsciiString HTTP header names, which can both cache such hash codes and/or compute them (the first time), very quickly.

What happen instead when String types are used? Netty perform case-insensitive hash code computations using a "slow path" version, without any chance to to cache them (because java's String cache its own hash code, not the one computed by Netty for HTTP headers lookup).
The mentioned Netty method is:

https://github.com/netty/netty/blob/1c7f0faac060313bfd313d0a50fdf83a2fd2c9c9/common/src/main/java/io/netty/util/AsciiString.java#L1413-L1421

Currently, for both Techempower's plaintext or whatever HTTP 1.1 request, Quarkus is performing at least 3 lookup which perform such wasteful computations, which are listed here:

  1. List<String> accepts = (List<String>) requestContext.getHeader(HttpHeaders.ACCEPT, false);
  2. List<String> acceptValues = (List<String>) requestContext.getHeader(HttpHeaders.ACCEPT, false);

Which compute hash codes for known header names ie Accept and Content-Type.

About the performance impacts:

  • these header names are not "big" meaning that their cost in a real world application will just cost very few, especially if compared to the validation cost, but still, will save some cpu resources
  • BUT, on benchmarks which exercise it using pipeling (eg Techempower plaintext), it can cost ~20% of the overall throughput

With such datapoints, feel free to accept/reject the change, although it would just make quarkus a bit more friendly with HTTP and Vertx expected fast-path types.

An additional note: why I haven't pushed this 1 year ago?
The reason is on netty/netty#12321 which was the previously bottleneck of our stack (and Vertx). solved it, this pr is the other remaining bottleneck to be solved.

@geoand
Copy link
Contributor

geoand commented Jul 24, 2023

seems to assume that ServerHttpResponse is a vertxResponse, according to its name.
It is correct

It's an incorrect name. At that point, there is no Vert.x dependency.

if it can just be of type QuarkusResteasyReactiveRequestContext maybe worth forcing CONTENT_TYPE to be the vertx HTTP 1.1 HttpHeaders.CONTENT_TYPE AsciiString variant and save checking it on VertxResteasyReactiveRequestContext

We would have to include some intelligence if we are going to do this.

@geoand
Copy link
Contributor

geoand commented Jul 24, 2023

We would have to include some intelligence if we are going to do this.

If you think it will make a noticeable difference, I'll have a look at how we can do it

@franz1981
Copy link
Contributor Author

ok, so re

We would have to include some intelligence if we are going to do this.

The one I've already in the changes of this PR should already do it (because are already in the relevant interesting hot spot)

@geoand
Copy link
Contributor

geoand commented Jul 24, 2023

With such datapoints, feel free to accept/reject the change, although it would just make quarkus a bit more friendly with HTTP and Vertx expected fast-path types.

I am +1 for this change (assuming they have been verified by benchmarks :))

@geoand
Copy link
Contributor

geoand commented Jul 24, 2023

The one I've already in the changes of this PR should already do it (because are already in the relevant interesting hot spot)

Okay, cool

@franz1981
Copy link
Contributor Author

I have just discovered that in few cases, checking that the request/response to be of VertX type won't work, because we have few wrapper classes. I will deal with it tomorrow, checking the performance implications and adding some tests too.

@geoand geoand removed the triage/needs-rebase This PR needs to be rebased first because it has merge conflicts label Jul 25, 2023
@geoand geoand removed the triage/needs-feedback We are waiting for feedback. label Jul 25, 2023
@franz1981
Copy link
Contributor Author

franz1981 commented Jul 25, 2023

@geoand The last form of this change, while running in my home-lab (Ryzen 7950 using 8 cores from a single NUMA node and other 8 cores for wrk on another NUMA node - kept separated to have stable results), is getting these numbers, for the plaintext test with 128 connections and 16 pipelined requests:

main

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   487.62us    2.05ms 114.56ms   93.06%
    Req/Sec   387.37k    47.01k  432.00k    94.39%
  123637120 requests in 40.10s, 12.32GB read
Requests/sec: 3083227.87
Transfer/sec:    314.62MB

This PR:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   458.99us    2.09ms 112.39ms   85.18%
    Req/Sec   409.52k    49.12k  451.23k    94.51%
  130721296 requests in 40.10s, 13.03GB read
Requests/sec: 3259896.92
Transfer/sec:    332.65MB

Which means ~5.7% better throughput: not a big deal overall. I'll pause this PR till I have the perf lab able to run it, but in the current form the results are not wow, especially since in the real world, the dominant cost will be the http name validation, instead.

@geoand
Copy link
Contributor

geoand commented Jul 25, 2023

Thanks for the update!

@franz1981
Copy link
Contributor Author

Update on this: I've spoken with @galderz this morning about it and it seems that native image isn't that great to optimize String::charAt on HTTP header validation due to the bimorphism: see https://github.com/eclipse-vertx/vert.x/blob/7763104c3edf2f3165b23ed6142580d9dda71e38/src/main/java/io/vertx/core/http/impl/HttpUtils.java#L889

But, such cost would likely disappear by using AsciiString header names, because we use directly the byte[] in; for a plaintext cpu bound scenario, the JVM mode cost of validations can be 0.51% but for the native image one is 3.4% which seems negligible, but is not, given that, for native image is about 10% of the cost of encoding and 10% of decoding HTTP content, which means that the actual improvement could be larger, but yet to measure.
I'll wait till we have some more data, still leaving this parked here, which should address the validation cost of header names, but NOT of header values, instead.

@geoand
Copy link
Contributor

geoand commented Sep 15, 2023

Thanks for the update!

@franz1981
Copy link
Contributor Author

I'm closing these being superseded by #37619

@franz1981 franz1981 closed this Feb 12, 2024
@quarkus-bot quarkus-bot bot added the triage/invalid This doesn't seem right label Feb 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants