-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the architecture that supports the correct order of HTTP responses #687
Comments
The background of the issue is pull request #660 and comments #660 (comment) ( The other issue is multiple list operations, e.g. #851 (comment) . Probably #851 (comment) is also good candidate to be done in context of this issue. Linked with #940 and #941. The last describes couple of possible optimizations in #884 (comment) (points 2 and 3). Also it seems that the only application of Linked with #1065 (HTTP requests fair scheduling).
|
Currently request processing routine looks like:
In this processing pipeline two queues are used: forward_queue and workqueue. First doesn't have distinct consumer and producer and protected by spinlocks, the last has distinct producer and consumer and implemented as lock-free. Probably we can revise the algorithm and get rid of locking;
Here producer to the forward_queue and consumer are split and no locks for forward_queue are required. Roughly the same algorithm for response processing should be implemented: push client socket into work queue after response is adjusted and ready to sent. There is a difference how work queue performs now and how it should perform in my proposal. Now task in work queue - is a single message send operation. In my proposal - iterating over the list and send requests one-by-one. It's not wise to add a new task with the same connection: progress on previous task will make later tasks empty, and we still need to track budget. Thus work queue should be reimplemented as round robin queue:
PROs:
CONs:
May be such approach was discussed, when message pipelining was introduced and there are some caveats I don't know... |
@ikoveshnikov good point! I appreciate further movement to the lock-free RB transport and do as much work as possible on local CPU. Also the proposal to move from single-message work for the RB to full connection processing looks quite good. We've never discussed such opportunity. Some time ago we've discussed that http.c becomes too sophisticated and it makes sense to split it to client and server parts, or tx and rx parts in your proposal. So probably we can end up with simpler and faster HTTP processing. |
Totally agree here. |
In context of last proposition, the problem with Also, there are several moments which should be taken into account - regarding
|
#1175 and #1180 are perfect examples of how complicated our HTTP connections processing is. The bugs have very complex scenarios and required a lot of time to find and fix the problem. Moreover, we had a lot of other synchronization problems on HTTP connections and queues. All in all, current architecture is too complex to develop and support and have clear performance issues. A better solution is required. I make the issue crucial because of constant queue of the bugs. Probably an easer than #687 (comment) solution is possible:
Each CPU can read Probably some CPU/connection scheduler is required for intentionally opened connection (listen-created connections are distributed by RSS/RPS). The scheduler algorithm is TBD. |
One more optimization proposal from #941 (comment) about more efficient handling of responses from dead client connections: A client sends us several requests and the client connection immediately terminated. All of the requests are in fwd_queue's and are being sent to servers, reforwarded and so on until all of the receive responses. Then It's OK to leave the dead requests in fwd_queue's to minimize lock contention, but it's easy and cheap to:
|
The current architecture extensively uses the locking of
seq_queue
in client connections andfwd_queue
in server connections to enforce the correct order of responses to HTTP requests. The locks are held for a relatively long time, and due to multi-CPU nature of Tempesta's operation there's significant lock contention in high load situations.As the current architecture and its behavioural patterns are understood better, it should be improved with the goal of decreasing the lock contention and improving the overall performance.
The text was updated successfully, but these errors were encountered: