Improve the architecture that supports the correct order of HTTP responses #687

keshonok · 2017-02-20T19:40:11Z

The current architecture extensively uses the locking of seq_queue in client connections and fwd_queue in server connections to enforce the correct order of responses to HTTP requests. The locks are held for a relatively long time, and due to multi-CPU nature of Tempesta's operation there's significant lock contention in high load situations.

As the current architecture and its behavioural patterns are understood better, it should be improved with the goal of decreasing the lock contention and improving the overall performance.

The text was updated successfully, but these errors were encountered:

krizhanovsky · 2017-02-20T21:46:22Z

The background of the issue is pull request #660 and comments #660 (comment) (tfw_http_req_fwd()) and #660 (comment) (tfw_http_resp_fwd()) in particular. Backoff and per-cpu lock data structures could be used just like in Linux MCS locking for the first case and maybe we can employ our our work queue to access TfwConn at per-cpu basis in the second case. Also see comments added in 6478c70 (contention on fwd_qlock and seq_qlock).

The other issue is multiple list operations, e.g. #851 (comment) .

Probably #851 (comment) is also good candidate to be done in context of this issue.

Linked with #940 and #941. The last describes couple of possible optimizations in #884 (comment) (points 2 and 3).

Also it seems that the only application of nip_queue/nip_list is just to unmark non-idempotent requests in a server send queue if a client sends more request(s) after a non-idempotent request, so probably the list can be eliminated.

Linked with #1065 (HTTP requests fair scheduling).

http.c becomes very complex (it's the second largest source file after the parser), so it'd be good to split the logic to HTTP server, client, and common pieces.

vankoven · 2018-12-21T07:59:49Z

Currently request processing routine looks like:

NET_RX_SOFTIRQ
parse request and detach from client connection;
make all required modifications to request before forwarding;
assign request into forward_queue on some server connection;
Forward queue lock is acquired;
forward all unsent requests on that server connection;
for each request to be forwarded: push server connection socket to workqueue with SS_Send command;
Forward queue lock is released;
NET_TX_SOFTIRQ
pop tasks from work queue and complete them: send already prepared buffers to sockets.

In this processing pipeline two queues are used: forward_queue and workqueue. First doesn't have distinct consumer and producer and protected by spinlocks, the last has distinct producer and consumer and implemented as lock-free. Probably we can revise the algorithm and get rid of locking;

NET_RX_SOFTIRQ
parse request and detach from client connection;
make all required modifications to request before forwarding;
assign request into forward_queue on some server connection. lock-free operation;
push server connection socket to workqueue with SS_Send command;
NET_TX_SOFTIRQ
pop tasks from work queue and complete them: get server connection from socket;
forward all unsent requests on that server connection;
for each request to be forwarded: send already prepared buffers to socket, don't pop requests from the forwarding_queue

Here producer to the forward_queue and consumer are split and no locks for forward_queue are required. Roughly the same algorithm for response processing should be implemented: push client socket into work queue after response is adjusted and ready to sent.

There is a difference how work queue performs now and how it should perform in my proposal. Now task in work queue - is a single message send operation. In my proposal - iterating over the list and send requests one-by-one. It's not wise to add a new task with the same connection: progress on previous task will make later tasks empty, and we still need to track budget.

Thus work queue should be reimplemented as round robin queue:
Pushing into queue:

If server connection is already in the queue: do nothing
else push it
Progressing on the queue:
pop the task from the queue
forward no more than per_connection_budget requests on the forward_queue, increment total_budget by number of forwarded requests;
if remainig total_budget is not 0, proceed to the next task (step 1)
forward_queue is not empty, push the connection to the end of work queue (if it's not in the queue already)
reschedule NET_TX_ACTION softirq

PROs:

Sptitting of send and receive code path, this should keep code cleaner;
More easy to make receive budgets for client connections, such buffers will be more predictable since roughly the same amount of time is required to process and push to queue every request.
Getting rid of forward_queue and seq_queue locks

CONs:

A lot of work;
???

May be such approach was discussed, when message pipelining was introduced and there are some caveats I don't know...

krizhanovsky · 2018-12-26T21:12:43Z

@ikoveshnikov good point! I appreciate further movement to the lock-free RB transport and do as much work as possible on local CPU. Also the proposal to move from single-message work for the RB to full connection processing looks quite good. We've never discussed such opportunity.

Some time ago we've discussed that http.c becomes too sophisticated and it makes sense to split it to client and server parts, or tx and rx parts in your proposal. So probably we can end up with simpler and faster HTTP processing.

vankoven · 2018-12-27T11:50:11Z

Some time ago we've discussed that http.c becomes too sophisticated and it makes sense to split it to client and server parts

Totally agree here.

aleksostapenko · 2019-01-03T19:17:35Z

In context of last proposition, the problem with -EBUSY error (which is one of the subjects of #1003 issue) may go away: if the connections themselves are in the work queue, we will be able to add requests there up to the limit of the queue itself (1000 by default) and the -EBUSY problem (when we have not time to unload the work queue) will simply transform to exhaustion of the fwd_queue limit; so, we'll be able to avoid of requests rescheduling on -EBUSY error (the current solution for #1003 issue).

Also, there are several moments which should be taken into account - regarding fwd_queue itself. We still need to make the fwd_queue lock-free, and in order to ease this task - proposed approach implies division push, send (read) and pop operations (which currently can be processed in one thread) into three threads, i.e. separate producer, sender (reader) and consumer contexts appear. However, some additional complexities need to be considered in this scheme:

In sender thread, apparently, it will also be necessary to evict (consume) requests from the fwd_queue (timeout eviction, dropped requests, and we can still run into SOCK_DEAD as well);
Re-sending procedure during connection repairing - also seems to be in the context of sender thread, so msg_sent resetting may occur in this context too;
The base re-scheduling procedure (during connection closing) - which is essentially pop and push operations - probably should be redesigned in this approach.

krizhanovsky · 2019-02-11T18:45:47Z

#1175 and #1180 are perfect examples of how complicated our HTTP connections processing is. The bugs have very complex scenarios and required a lot of time to find and fix the problem. Moreover, we had a lot of other synchronization problems on HTTP connections and queues. All in all, current architecture is too complex to develop and support and have clear performance issues. A better solution is required. I make the issue crucial because of constant queue of the bugs.

Probably an easer than #687 (comment) solution is possible:

assign each HTTP connection to a CPU, just like we do this for TCP socket.
distribute works among the connections through the lock-free RB queue transport.

Each CPU can read TfwConn object from the remote CPU, but need to 'send a message' to the CPU if it needs to send an HTTP message through it's queue, close it, or whatever else. Probably the same work queues, as we use for TCP sockets, can be reused - it seems we don't need them any more if we have per-CPU HTTP connections on higher layer.

Probably some CPU/connection scheduler is required for intentionally opened connection (listen-created connections are distributed by RSS/RPS). The scheduler algorithm is TBD.

krizhanovsky · 2019-07-02T14:57:24Z

One more optimization proposal from #941 (comment) about more efficient handling of responses from dead client connections:

A client sends us several requests and the client connection immediately terminated. All of the requests are in fwd_queue's and are being sent to servers, reforwarded and so on until all of the receive responses. Then tfw_http_resp_fwd() finds empty seq_queue of the client connection and calls ss_close_sync() for each of the requests. All in all, we do a lot of unnecessary work.

It's OK to leave the dead requests in fwd_queue's to minimize lock contention, but it's easy and cheap to:

set some flag for the reuests to make them evicted from fwd_queue right they're going to be send to a server or rescheduled;
we don't need to call ss_close_sync() in tfw_http_resp_fwd() since all connection_error and connection_drop callbacks follow ss_do_close() call in sock.c, so tfw_http_conn_cli_drop() is always called on closed socket.
tfw_http_conn_cli_drop() should call tfw_connection_put() just like tfw_sock_srv_connect_drop() and tfw_sock_srv_connect_failover() do. When tfw_http_resp_fwd() calls tfw_http_conn_msg_free() for the last request referencing the client connection, the connection is freed.

keshonok added enhancement performance labels Feb 20, 2017

keshonok added this to the 0.6 WebOS milestone Feb 20, 2017

krizhanovsky mentioned this issue Dec 28, 2017

Live reconfiguration #851

Merged

krizhanovsky modified the milestones: backlog, 0.7 HTTP/2 Feb 5, 2018

krizhanovsky self-assigned this Feb 14, 2018

krizhanovsky added a commit that referenced this issue Feb 15, 2018

#687: fix #851 (comment) - do less list traversals under fwd_qlock

1cb0cb4

krizhanovsky mentioned this issue Mar 6, 2018

Requests queueing if there is no backend connection #940

Open

krizhanovsky modified the milestones: 1.0 Beta, 0.7 HTTP/2 Jul 15, 2018

krizhanovsky mentioned this issue Sep 20, 2018

HTTP requests fair scheduling #1065

Open

aleksostapenko mentioned this issue Dec 20, 2018

Fix #1003: Store requests for resending if connection became inactive. #1129

Closed

krizhanovsky mentioned this issue Dec 26, 2018

Redesign of TCP synchronous sending and data caching #391

Open

16 tasks

krizhanovsky assigned vankoven Dec 26, 2018

krizhanovsky modified the milestones: 0.8 HTTP/2 & TLS 1.3, 1.0 Beta Feb 2, 2019

krizhanovsky mentioned this issue Feb 4, 2019

HTTP/2 #309

Closed

7 tasks

krizhanovsky unassigned vankoven Feb 11, 2019

krizhanovsky added the crucial label Feb 11, 2019

krizhanovsky modified the milestones: 1.0 Beta, 1.1 Network performance & scalability, 1.1 TBD (Network performance & scalability), 1.1 TDB (ML, QUIC, DoH etc.) Feb 11, 2019

krizhanovsky mentioned this issue Jul 2, 2019

Paired request missing #941

Closed

krizhanovsky modified the milestones: 1.1 TBD (ML, QUIC, DoH etc.), 0.8 TLS 1.3 & Performance Oct 13, 2019

krizhanovsky modified the milestones: 1.1 TBD (ML, QUIC, DoH etc.), 0.9 TDBv0.2 - Beta Nov 6, 2019

vankoven mentioned this issue Mar 11, 2020

Fragmentation for h2 messages #1378

Closed

4 tasks

krizhanovsky modified the milestones: 0.9 - TDB, 1.2 TBD Jan 3, 2022

krizhanovsky removed the crucial label Jan 29, 2023

krizhanovsky mentioned this issue Feb 5, 2023

Flow contol and pull-model for message forwarding #1394

Closed

7 tasks

krizhanovsky mentioned this issue Feb 14, 2023

Fix kernel panic in case of small value of SETTINGS_HEADER_TABLE_SIZE #1804

Merged

krizhanovsky modified the milestones: 1.xx TBD, 1.x: TBD Apr 19, 2023

krizhanovsky removed their assignment Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the architecture that supports the correct order of HTTP responses #687

Improve the architecture that supports the correct order of HTTP responses #687

keshonok commented Feb 20, 2017 •

edited

Loading

krizhanovsky commented Feb 20, 2017 •

edited

Loading

vankoven commented Dec 21, 2018 •

edited

Loading

krizhanovsky commented Dec 26, 2018 •

edited

Loading

vankoven commented Dec 27, 2018 •

edited

Loading

aleksostapenko commented Jan 3, 2019 •

edited

Loading

krizhanovsky commented Feb 11, 2019

krizhanovsky commented Jul 2, 2019

Improve the architecture that supports the correct order of HTTP responses #687

Improve the architecture that supports the correct order of HTTP responses #687

Comments

keshonok commented Feb 20, 2017 • edited Loading

krizhanovsky commented Feb 20, 2017 • edited Loading

vankoven commented Dec 21, 2018 • edited Loading

krizhanovsky commented Dec 26, 2018 • edited Loading

vankoven commented Dec 27, 2018 • edited Loading

aleksostapenko commented Jan 3, 2019 • edited Loading

krizhanovsky commented Feb 11, 2019

krizhanovsky commented Jul 2, 2019

keshonok commented Feb 20, 2017 •

edited

Loading

krizhanovsky commented Feb 20, 2017 •

edited

Loading

vankoven commented Dec 21, 2018 •

edited

Loading

krizhanovsky commented Dec 26, 2018 •

edited

Loading

vankoven commented Dec 27, 2018 •

edited

Loading

aleksostapenko commented Jan 3, 2019 •

edited

Loading