Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the architecture that supports the correct order of HTTP responses #687

Open
keshonok opened this issue Feb 20, 2017 · 7 comments
Open

Comments

@keshonok
Copy link
Contributor

keshonok commented Feb 20, 2017

The current architecture extensively uses the locking of seq_queue in client connections and fwd_queue in server connections to enforce the correct order of responses to HTTP requests. The locks are held for a relatively long time, and due to multi-CPU nature of Tempesta's operation there's significant lock contention in high load situations.

As the current architecture and its behavioural patterns are understood better, it should be improved with the goal of decreasing the lock contention and improving the overall performance.

@keshonok keshonok added this to the 0.6 WebOS milestone Feb 20, 2017
@krizhanovsky
Copy link
Contributor

krizhanovsky commented Feb 20, 2017

The background of the issue is pull request #660 and comments #660 (comment) (tfw_http_req_fwd()) and #660 (comment) (tfw_http_resp_fwd()) in particular. Backoff and per-cpu lock data structures could be used just like in Linux MCS locking for the first case and maybe we can employ our our work queue to access TfwConn at per-cpu basis in the second case. Also see comments added in 6478c70 (contention on fwd_qlock and seq_qlock).

The other issue is multiple list operations, e.g. #851 (comment) .

Probably #851 (comment) is also good candidate to be done in context of this issue.

Linked with #940 and #941. The last describes couple of possible optimizations in #884 (comment) (points 2 and 3).

Also it seems that the only application of nip_queue/nip_list is just to unmark non-idempotent requests in a server send queue if a client sends more request(s) after a non-idempotent request, so probably the list can be eliminated.

Linked with #1065 (HTTP requests fair scheduling).

http.c becomes very complex (it's the second largest source file after the parser), so it'd be good to split the logic to HTTP server, client, and common pieces.

@vankoven
Copy link
Contributor

vankoven commented Dec 21, 2018

Currently request processing routine looks like:

  1. NET_RX_SOFTIRQ
  2. parse request and detach from client connection;
  3. make all required modifications to request before forwarding;
  4. assign request into forward_queue on some server connection;
  5. Forward queue lock is acquired;
  6. forward all unsent requests on that server connection;
  7. for each request to be forwarded: push server connection socket to workqueue with SS_Send command;
  8. Forward queue lock is released;
  9. NET_TX_SOFTIRQ
  10. pop tasks from work queue and complete them: send already prepared buffers to sockets.

In this processing pipeline two queues are used: forward_queue and workqueue. First doesn't have distinct consumer and producer and protected by spinlocks, the last has distinct producer and consumer and implemented as lock-free. Probably we can revise the algorithm and get rid of locking;

  1. NET_RX_SOFTIRQ
  2. parse request and detach from client connection;
  3. make all required modifications to request before forwarding;
  4. assign request into forward_queue on some server connection. lock-free operation;
  5. push server connection socket to workqueue with SS_Send command;
  6. NET_TX_SOFTIRQ
  7. pop tasks from work queue and complete them: get server connection from socket;
  8. forward all unsent requests on that server connection;
  9. for each request to be forwarded: send already prepared buffers to socket, don't pop requests from the forwarding_queue

Here producer to the forward_queue and consumer are split and no locks for forward_queue are required. Roughly the same algorithm for response processing should be implemented: push client socket into work queue after response is adjusted and ready to sent.

There is a difference how work queue performs now and how it should perform in my proposal. Now task in work queue - is a single message send operation. In my proposal - iterating over the list and send requests one-by-one. It's not wise to add a new task with the same connection: progress on previous task will make later tasks empty, and we still need to track budget.

Thus work queue should be reimplemented as round robin queue:
Pushing into queue:

  • If server connection is already in the queue: do nothing
  • else push it
    Progressing on the queue:
  • pop the task from the queue
  • forward no more than per_connection_budget requests on the forward_queue, increment total_budget by number of forwarded requests;
  • if remainig total_budget is not 0, proceed to the next task (step 1)
  • forward_queue is not empty, push the connection to the end of work queue (if it's not in the queue already)
  • reschedule NET_TX_ACTION softirq

PROs:

  • Sptitting of send and receive code path, this should keep code cleaner;
  • More easy to make receive budgets for client connections, such buffers will be more predictable since roughly the same amount of time is required to process and push to queue every request.
  • Getting rid of forward_queue and seq_queue locks

CONs:

  • A lot of work;
  • ???

May be such approach was discussed, when message pipelining was introduced and there are some caveats I don't know...

@krizhanovsky
Copy link
Contributor

krizhanovsky commented Dec 26, 2018

@ikoveshnikov good point! I appreciate further movement to the lock-free RB transport and do as much work as possible on local CPU. Also the proposal to move from single-message work for the RB to full connection processing looks quite good. We've never discussed such opportunity.

Some time ago we've discussed that http.c becomes too sophisticated and it makes sense to split it to client and server parts, or tx and rx parts in your proposal. So probably we can end up with simpler and faster HTTP processing.

@vankoven
Copy link
Contributor

vankoven commented Dec 27, 2018

Some time ago we've discussed that http.c becomes too sophisticated and it makes sense to split it to client and server parts

Totally agree here.

@aleksostapenko
Copy link
Contributor

aleksostapenko commented Jan 3, 2019

In context of last proposition, the problem with -EBUSY error (which is one of the subjects of #1003 issue) may go away: if the connections themselves are in the work queue, we will be able to add requests there up to the limit of the queue itself (1000 by default) and the -EBUSY problem (when we have not time to unload the work queue) will simply transform to exhaustion of the fwd_queue limit; so, we'll be able to avoid of requests rescheduling on -EBUSY error (the current solution for #1003 issue).

Also, there are several moments which should be taken into account - regarding fwd_queue itself. We still need to make the fwd_queue lock-free, and in order to ease this task - proposed approach implies division push, send (read) and pop operations (which currently can be processed in one thread) into three threads, i.e. separate producer, sender (reader) and consumer contexts appear. However, some additional complexities need to be considered in this scheme:

  1. In sender thread, apparently, it will also be necessary to evict (consume) requests from the fwd_queue (timeout eviction, dropped requests, and we can still run into SOCK_DEAD as well);
  2. Re-sending procedure during connection repairing - also seems to be in the context of sender thread, so msg_sent resetting may occur in this context too;
  3. The base re-scheduling procedure (during connection closing) - which is essentially pop and push operations - probably should be redesigned in this approach.

@krizhanovsky
Copy link
Contributor

#1175 and #1180 are perfect examples of how complicated our HTTP connections processing is. The bugs have very complex scenarios and required a lot of time to find and fix the problem. Moreover, we had a lot of other synchronization problems on HTTP connections and queues. All in all, current architecture is too complex to develop and support and have clear performance issues. A better solution is required. I make the issue crucial because of constant queue of the bugs.

Probably an easer than #687 (comment) solution is possible:

  1. assign each HTTP connection to a CPU, just like we do this for TCP socket.
  2. distribute works among the connections through the lock-free RB queue transport.

Each CPU can read TfwConn object from the remote CPU, but need to 'send a message' to the CPU if it needs to send an HTTP message through it's queue, close it, or whatever else. Probably the same work queues, as we use for TCP sockets, can be reused - it seems we don't need them any more if we have per-CPU HTTP connections on higher layer.

Probably some CPU/connection scheduler is required for intentionally opened connection (listen-created connections are distributed by RSS/RPS). The scheduler algorithm is TBD.

@krizhanovsky krizhanovsky modified the milestones: 1.0 Beta, 1.1 Network performance & scalability, 1.1 TBD (Network performance & scalability), 1.1 TDB (ML, QUIC, DoH etc.) Feb 11, 2019
@krizhanovsky
Copy link
Contributor

One more optimization proposal from #941 (comment) about more efficient handling of responses from dead client connections:

A client sends us several requests and the client connection immediately terminated. All of the requests are in fwd_queue's and are being sent to servers, reforwarded and so on until all of the receive responses. Then tfw_http_resp_fwd() finds empty seq_queue of the client connection and calls ss_close_sync() for each of the requests. All in all, we do a lot of unnecessary work.

It's OK to leave the dead requests in fwd_queue's to minimize lock contention, but it's easy and cheap to:

  1. set some flag for the reuests to make them evicted from fwd_queue right they're going to be send to a server or rescheduled;
  2. we don't need to call ss_close_sync() in tfw_http_resp_fwd() since all connection_error and connection_drop callbacks follow ss_do_close() call in sock.c, so tfw_http_conn_cli_drop() is always called on closed socket.
  3. tfw_http_conn_cli_drop() should call tfw_connection_put() just like tfw_sock_srv_connect_drop() and tfw_sock_srv_connect_failover() do. When tfw_http_resp_fwd() calls tfw_http_conn_msg_free() for the last request referencing the client connection, the connection is freed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants