Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A new iteration of #498: message streaming mode. NOT done and work in process #1183

Closed
wants to merge 2,662 commits into from

Conversation

vankoven
Copy link
Contributor

@vankoven vankoven commented Feb 12, 2019

The task is not completed yet and the work is still in progress. It's here to make a very high level review to estimate the pros and cons of the used approach.

The code compiles and it even works in some scenarios, but it doesn't look good for now. Last week I've tried to fix issues I had in streaming, especially in scenarios when one (or both) of the streaming peers suddenly disconnects during streaming, but I completely ruined the streaming process and the code became over sophisticated, unreliable and tending to crash. So I tried to revert to very beginning and move forward commit-by-commit trying to simplify the code and make it more straight forward, but I didn't manage to finish it yet and some pieces of code are too messy and unreliable.

Here the issues that appeared during implementation.

TCP window steering

TCP receive window steering. There are only two situations, when the window steering is required. In first, client feeds us tons of pipelined requests. Not very big problem and can be resolved via Frang limits. Frang will block too fast clients, normally clients are slower than the servers. Probably closing client connections with 429 status code is acceptable. The second case is message streaming. We don't want to allow client to eat all the memory by a sing request (e.g. POST request with Blue-Ray image encapsulated).

Note, that the window steering can't slow down clients that doesn't use request pipelining. It sends request - window gets lower, it receives the response - window is maximum agains.

The task is to set N as buffer size for both unparsed skbs in receive queue and parsed requests in TfwCliConn->seq_queue. If requests strored in seq_queue are already filled the buffer, then advertize a zero window to the client. If a response is forwarded to the client, then the corresponding request must be destroyed and the advertised window should grow on.

To control window size, we need to force the kernel to take into account the space occupied by the requests in the seq_queue. If we look at tcp bufers we will see two buffers: "network" buffer advertised as receive window and "application" buffer used to isolate application from network. Currently skbs are orphaned from the tcp socket right after receive, so the TCP stack sees that there is no allocated memory from the socket and advertises the maximum window size. My idea was to register size of parsed HTTP message in the sk->sk_rmem_alloc to make the Kernel aware of the used memory and avoid manual window size calculations. The trick worked. While client was filling the "application" buffer, he was advertised a maximum window size, but while "network" buffer was used the window got smaller, up to zero size. Another great result was possibility to set different buffer limits for client and server connections.

The code worked in all scenarios. Except one. Closing connections in very concurrent environment. Dropped requests might be deregistered twice from the sk->sk_rmem_alloc, so it wasn't empty on socket destruction, so kernel might to generate a pretty long traces on wrk stops. The issue has a sporadic character.

Streaming tricky situations

Streamed messages can be buffered.

This happen, when the receiver is not ready to receive a stream. E.g. client requested two pipelined uris: first is very slow to complete, second is fast, but it can be a full BD image. It's not possible to stream the BD image until the first request is responsed. We can't put server connection on hold. Client may intentionally slow down receiving, so the server connection would get stack. That is why window steering can't be applied to server connections.

Some parts of a streamed messages must be buffered for some time

Some high level http processing can't work with partial data. E.g. trailer headers can't be processed by parts, we have to buffer them fully before process. Although RFC allow to ignore trailer headers in stream mode, we still want to validate the headers and modify them if req_hdr_set directive is set.

Response can appear at any time.

Http is request-response protocol. Response is sent only after request was received. Well, forget about that during the streaming. Imagine, that a part of the request was streamed to backend and a new part is being parsed. There can be already a response received from the backend. Scenario 1: try to stream GET request with a long body to Nginx and make a pause, Nginx will respond immediately without waiting for the message end. Scenario 2: stream a part of request and make a syntax error, backend will respond with 4xx code immediately (Tempesta has the same behaviour).

Target connections to stream a message can dissappear at any time.

Imagine a request is streamed, a part of it was already received by the backend. It already sent a response. Now error happens and backend connection is closed. We still need to receive the rest of the stream, but there is no need to send it anywhere since response is received. Just drop it part by part.

Client disconnected when streamed message wasn't received in full

If the partly received request wasn't sent to backend, simply evict it. It it was, then there is no recovery possibilities, only to close server connection.

(The list is not finished ind to be continued shortly.)

aleksostapenko and others added 30 commits October 11, 2018 20:50
 Fix #806: Reduce count of IPI during work_queue processing.
2. Fix handshake checksumming. Now we successfully finish handshakes (yahoo!);
3. Introduce TfwFsmData->trail to not to treat TLS TAG as part of HTTP request;
4. Introduce ttls_bzero_safe() for fast & non-optimizable zeroing in process context;
5. Coding style cleanup for PK and some other minor cleanups.
from processing the tag if an HTTP request spans over TLS records bound.
Sometimes with make -jN, tdbq fails to link with libtdb, since there is
no libtdb yet.
Reset default vhost during start of Tempesta FW
Drops everything from the TfwAddr union, making it essentially an alias for
struct sockaddr_in6. IPv4 addresses are converted to IPv4-mapped addresses
at the parsing stage.

Macros for log reporting with addresses of both IPv4 and IPv6 are unified.
They got a new parameter controlling whenever ports should be reported too.
Use struct sockaddr_in6 for all addresses
the lists and let FSMs to keep their skb lists. skb lists can be sent
to GFSM calls only from TLS level which is very special in necessity
to collect skb for complete records, so move the lists processing to
HTTP layer and prcess each skb in the list separately.
and accouning of parsed bytes in the recent http and ss changes.
Preserve correct fsm state when moving to a next state
Fix unstable JS challange passing
drop unused "unused" parameter of frang_conn_limit()
…LS 0.1:

don't close TCP connection on server response with 'Connection: close' -
in this case TLS close notification message is sent on closed connection.
Add new TfwConnHooks->conn_close hook to send TLS close notification alert
just before closing a connection, so there is tfw_connection_close()
wrapper instead of direct ss_close() calls.
Also adds a rare case of empty slice handling.
When data are in multiple chunks, it's necessary to initialize 'tr' at the
beginning of every new chunk.
…ulation

fixups for sticky cookie calculation in case of segmented data
…itting

test cookie parser splitting string in the right places
If it was 0 for some reason, we force it to be UINT_MAX instead, exactly
to avoid such checks at run time.
@krizhanovsky
Copy link
Contributor

krizhanovsky commented Feb 13, 2019

Previous versions of the pull request with comments - #1012 and #1067

Copy link
Contributor

@krizhanovsky krizhanovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made only high level review, but I really appreciate the way how #498 is solved in this PR. I updated the original issue by additional information discussed in this and previous pull requests.

* I.e. don't stream messages byte by byte, buffer some data before forward
* it; if timeout is exceeded, stream as many data as available, don't wait
* for more. It can be rather long wait in case of long polling. The idea
* behind is same as for Nagle algorithm in TCP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should stream messages with skb granulrity to avoid costly skb splitting or allocation.

"set client_msg_buffering to %u.\n",
tfw_cfg_cli_rmem, tfw_cfg_cli_msg_buff_sz,
tfw_cfg_cli_rmem);
tfw_cfg_cli_msg_buff_sz = tfw_cfg_cli_rmem;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have equal buffer and rmem, then it seems we'll frequently send 0 receive window for large messages: we have to announce 0 window to a client until we free all the memory occupied by the forwarded HTTP message.

* message processing inside the Tempesta due to message modifications.
* Two variants are possible: mirror all modifications to sk->sk_rmem_alloc
* or operate only by original message size. The second case was chosen
* because its simplicity and error-free possibilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree.

* Limit total memory used to serve single TCP connection.
*
* Received skb may be stored in one of two buffers: socket receive buffer
* (sk->sk_receive_queue) or Tempesta connection buffer (TfwCliConn->seq_queue).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as an skb is placed in sk_receive_queue a socket callback is called and Tempesta removes the skb from the queue, so actually we don't use the receive buffer at all except TCP OOO segment queue. So tcp_rmem is wasted only by OOO segments in our case. Probably this note doesn't affect the PR, but just to keep this in mind.

TfwHttpMsg *
tfw_http_msg_alloc_stream_part(TfwHttpMsg *hm_head)
{
TfwHttpStream *stream;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed that if we receive a part of streamed HTTP message, then there is current TfwHttpMsg which we can attach the new data, use it for the parser and so on. Why do we need TfwHttpStream and even allocate the new descriptor?

* more streamed and not fully received message in its queue. So the streamed
* message is actually get buffered until the receiver is ready. There are also
* some cases when a more of the message must be buffered before is can be
* processed, e.g. trailer header processing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are 2 streamed messages, then there are should be 2 incomplete TfwHttpMsg descriptors - why not to use them? We even can use parts of TfwMsg->skb_head list, possibly with some skb pointers in the list, to handle a message headers and only current stream buffer...

new_measure:
tp->rcvq_space.seq = tp->copied_seq;
tp->rcvq_space.time = tp->tcp_mstamp;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I love what's going on here - now it seems like the right way to solve the issue!

@vankoven vankoven changed the title PR of failed #498 task A new iteration of #498: message streaming mode. NOT done and work in process Feb 14, 2019
@vankoven vankoven marked this pull request as draft September 2, 2020 14:13
@krizhanovsky krizhanovsky deleted the ik-streaming-failed branch April 4, 2023 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants