-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speedup aiohttp web server #2779
Comments
Also, the compression and the chunk features that are mixed between Allowing a clean happy path for some ad-hoc cases - that surely are the most used - as I'm aligned with you, but I would try first to speed up the |
I think it's OK to ignore any content-length and transfer-encoding settings for these responses, and always use the content-type specified by user if there is any. But I don't quite understand the speedup, I thought building the headers are always slower... |
the check for headers is quite expensive <- I don’t understand |
P.S. |
Also |
How much do they cost comparing to a simple json encode? |
How is json encoding related to the issue? |
If it costs far less than json encode, it may not worth it. |
Say again, why should I care about JSON encoding cost at all? |
Take it easy... When optimising, we begin from the slowest part. I have tested the performance of aiohttp before, It is about 4000qps/CPU core, that is 250us per request. We must reduce 50us or more to make a difference. And if I remember It correctly, encoding of a JSON response takes about 10us. So I was wondering if doing checks are really a bottleneck now. I don't have a testing environment with me until next week. I will look at your gist after that. |
@hubo1016 |
@kxpal understood. I am just using it as a benchmark baseline, since different hardware produces very different numbers. |
4000 RPS? |
8200 RPS on my laptop |
@thomaszdxsn python debugger is off, access log is off, uvloop is used? |
https://gist.github.com/asvetlov/13295efdc1301bea241d7d35496e6f81 I take this gist, and not use cProfile.
what mean? I know python have a debugger call |
Sorry, my mistake -- should be "profiler". |
yes, this is my command:
wrk command:
output:
another wrk:
output:
my computer:
maybe reason is MacOS |
@asvetlov ah, I didn't turn off the access log. I just use the default settings. It makes sense if you are targeting 20k+qps. Shall we rewrite the whole class with Cython first? That might be easier. |
from my experience, biggest win in term of performance is related to ability to run async code in place |
Good point |
interesting @fafhrd91! Will be nice to see if the naive code path that has a completed response without needing a loop iteration exists right now in the current implementation, plenty of awaits that might be breaking the execution. |
It will be nice to implement http/2 , that will give a really good performance boost on real world apps. As http/2 has a HPACK header compression, it is possible to use caching for header parsing, validation, and all other stuff. What will give blasting performance boost and top limit of uvloop tcp performance (100k rps) can be reached nearly, as http/2 is a binary protocol. And that's not all - almost all real world apps uses sessions, no matter what kind they are (kv, rdms, jwt) they all need some work for cpu, this work can bee cached to. And there is a ready implementation of http/2 with python binding , what would not be hard to integrate. |
Are you volunteering for the task? |
Maybe. If it will be requirement for my current project (we have some performance issues against heavy application level ddos). This task is interesting for me, but I cant give any promises about my part in it because there are always some factors that may break promises. |
Without profiler With profiler |
I'm guessing that's UrlMappingMatchInfo.add_app(), which would make sense, because you get a new match info for each request and you want the app available on there. It just sets it as the current_app and adds it to an empty list (when routing through sub apps, then I think you end up with a list representing the chain of apps). |
Previously a new enum object had to be created and the reason phase looked up. Since there are a limited number of codes, we can pre-build a dict with all the phases related issue #2779
3.10 - with pending PRs Requests/sec: 43981.32 Closer to where 3.9.5 was but likely faster for many real world use cases since the benchmark is weighted towards the single static case |
Nearly all the calls are going to be the same so we can cache the logic related issue #2779
You mentioned isinstance() checks, I'm guessing a bunch of these are in asserts (mostly for type checking). How does the performance compare with |
I’ll check when I get back to a laptop. I did a diff against 3.9.5 and found a few more places where a variable check was replaced with isinstance. If I recall correctly isinstance is fast on a hit and much slower on a miss. |
Since we will almost always set the PayloadAccessError on the payload, only create it once related issue #2779
Since we will almost always set the PayloadAccessError on the payload, only create it once related issue #2779
3.10 with pending PRs Requests/sec: 46102.41 That brings is back to 3.9.x performance and almost as good as 3.8.x Likely the real world performance actually improvement is much better since what this benchmark measures is likely not a common real world use case. |
diff --git a/aiohttp/web_response.py b/aiohttp/web_response.py
index d2f056299..8a475f081 100644
--- a/aiohttp/web_response.py
+++ b/aiohttp/web_response.py
@@ -646,7 +646,10 @@ class Response(StreamResponse):
return None
if hdrs.CONTENT_LENGTH in self._headers:
- return super().content_length
+ cl = self._headers[hdrs.CONTENT_LENGTH]
+ if cl is not None:
+ return int(cl)
+ return None
if self._compressed_body is not None:
# Return length of the compressed body It looks like all the super super calls to get Requests/sec: 46683.39 |
#5197 introduced a small performance regression. We could avoid creating the cookie objects if there are no cookies which is a more likely case for a web response. https://github.com/aio-libs/aiohttp/pull/new/cookie_pref diff --git a/aiohttp/helpers.py b/aiohttp/helpers.py
index 55b363003..555d53113 100644
--- a/aiohttp/helpers.py
+++ b/aiohttp/helpers.py
@@ -914,14 +914,18 @@ class CookieMixin:
# slot themselves.
__slots__ = ()
+ _cookies: Optional[SimpleCookie]
+
def __init__(self) -> None:
super().__init__()
# Mypy doesn't like that _cookies isn't in __slots__.
# See the comment on this class's __slots__ for why this is OK.
- self._cookies = SimpleCookie() # type: ignore[misc]
+ self._cookies = None
@property
def cookies(self) -> SimpleCookie:
+ if self._cookies is None:
+ self._cookies = SimpleCookie()
return self._cookies
def set_cookie(
@@ -942,6 +946,8 @@ class CookieMixin:
Sets new cookie or updates existent with new value.
Also updates only those params which are not None.
"""
+ if self._cookies is None:
+ self._cookies = SimpleCookie()
old = self._cookies.get(name)
if old is not None and old.coded_value == "":
# deleted cookie
@@ -996,6 +1002,8 @@ class CookieMixin:
Creates new empty expired cookie.
"""
# TODO: do we need domain/path here?
+ if self._cookies is None:
+ return
self._cookies.pop(name, None)
self.set_cookie(
name,
diff --git a/aiohttp/web_response.py b/aiohttp/web_response.py
index d2f056299..941df85fd 100644
--- a/aiohttp/web_response.py
+++ b/aiohttp/web_response.py
@@ -383,7 +383,8 @@ class StreamResponse(BaseClass, HeadersMixin, CookieMixin):
version = request.version
headers = self._headers
- populate_with_cookies(headers, self.cookies)
+ if self._cookies:
+ populate_with_cookies(headers, self._cookies)
if self._compression:
await self._start_compression(request) |
The typical web handler looks like:
The text was updated successfully, but these errors were encountered: