-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple serialization, frame splitting, and compression in protocol #5150
base: main
Are you sure you want to change the base?
Conversation
Today we try to split up large messages in comms. This is useful in a few situations: 1. Websockets, which often pass frames through middleware that requires small messages 2. TLS, which fails on some OpenSSL versions with frames above the size of an int We correctly cut up data frames into smaller pieces to address these issues. However we don't apply this same logic to the header frame, which may still contain very large bytestrings. This commit adds a workaround in protocol dumps/loads which watches for this event and splits the header frame up if necessary. It works, but it's not very smooth. I would prefer that in the future we think about what a proper header should look like and ensure that it contains no user data. In the meantime this should help.
Currently compression and frame splitting are tightly interwoven with traversing through messages. This can be efficient, but results in a complex system where it's hard to reason about when things get split or compressed (indeed, this lead to a difficult to track down bug with frame splitting). This commit separates these processes into three separate stages: 1. Serialize all objects into frames 2. Split large frames 3. Compress compressible frames This results in a much more uniform application of splitting and compressing. However, this comes with a couple of undesired effects. 1. We add a new header if either splitting or compressing has occurred 2. We no longer avoid decompression when we don't want to deserialize There is probably a clean way to achieve most/all of our goals here. I wanted to push this up to start this conversation.
I think this is a very good idea but it made me wonder: why do we split frames? Because if compression libraries are the only reason, I think we should delegate the splitting and un-splitting to the compression libraries. That is, for the compression libraries that doesn't support arbitrary buffer sizes, we wrap the compression calls spit and un-split function. I think this will make the design even more simple. |
Various pieces of network machinery don't like large frames.
I wouldn't be surprised if would find more if we turned it off completely. There is a lot of strange software in the middle of corporate networks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various pieces of network machinery don't like large frames.
In that case, I think this is a good idea :)
The PR looks good to me but I suggest that you implement the Split large frames
and the merge and decompress frames
code in two functions. This will make the symmetry between the splitting and merging more clear.
frames2 = [] | ||
lengths = [] | ||
compressions = [] | ||
from distributed.protocol.utils import frame_split_size as split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move import to the top of the file, I don't think there is any cyclic imports?
cc @gjoseph92 (who has been thinking about this from the merging side recently) |
Currently compression and frame splitting are tightly interwoven with
traversing through messages. This can be efficient, but results in a
complex system where it's hard to reason about when things get split or
compressed (indeed, this lead to a difficult to track down bug with
frame splitting).
This commit separates these processes into three separate stages:
This results in a much more uniform application of splitting and
compressing. However, this comes with a couple of undesired effects.
There is probably a clean way to achieve most/all of our goals here.
I wanted to push this up to start this conversation.
cc @quasiben @jakirkham @madsbk