Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the EVA protocol: kozlovsky edition #6845

Closed
kozlovsky opened this issue Apr 4, 2022 · 3 comments
Closed

Improve the EVA protocol: kozlovsky edition #6845

kozlovsky opened this issue Apr 4, 2022 · 3 comments
Assignees
Milestone

Comments

@kozlovsky
Copy link
Contributor

kozlovsky commented Apr 4, 2022

@drew2a I reviewed PR #6831 and did not find problems in this PR, but I think I found the other significant issue in EVA protocol unrelated to the PR changes.

The problem is in the EVAProtocol.on_data method:

async def on_data(self, peer, payload):
    ...
    can_be_handled = transfer.block_number == payload.block_number - 1
    if not can_be_handled or transfer.nonce != payload.nonce:
        return
    ...

As I understand it, this method expects to receive the incoming messages in strictly sequential order, and a new message should have a block number that is one more than the previous one. At the same time, the on_acknowledgement method sends outgoing messages in batches of 16, where 16 is window_size:

async def on_acknowledgement(self, peer: Peer, payload: Acknowledgement):
    ...
    for block_number in range(transfer.block_number, transfer.block_number + transfer.window_size):
        ...
        self.community.eva_send_message(peer, Data(block_number, transfer.nonce, data))

As messages are UDP packets, the network can deliver the messages of the same batch in any order; it is incorrect to expect they will be received in the same sequence. The current implementation of the on_data method will reject many messages as "out-of-order", and it can significantly slow down the data transfer. We can add a diagnostic counter of how many messages are received out-of-order, I suspect the number may be significant if there is some traffic in the network.

The on_data method should accept any message whose number lies inside the current window. The method can put incoming messages in an array of window_size, using the block number to calculate the message's offset in the array. Initially, it can be initialized as an array of Nones, and after all messages of the current window are received, they can be joined together and appended to the result binary string. This way, EVAProtocol performance can potentially be much better under a heavy load.


Also, a couple of minor notices:

First:

transfer.window_size = max(self.MIN_WINDOWS_SIZE, min(payload.window_size, self.binary_size_limit))

It does not look correct to calculate min between window_size and binary_size_limit because window_size is a dimensionless quantity (the number of messages), while binary_size_limit` is the number of bytes. Apples and oranges.

Second:

transfer.data_binary += payload.data_binary

This is potentially slow because it copies the entire previous data each time a new piece is appended. The correct way is to accumulate binary chunks in a list:

transfer.data_list.append(payload.data_binary)

And join all chunks to a single binary string at the end in a single step:

result = ""b.join(transfer.data_list)

Additional from @devos50

It would be nice to have ReadRequest along with the WriteRequest.


Add confirm_callback like:

    def confirm_callback(self, peer: Peer, info: bytes, data_size: int):
        if info == b'accept_me':
            return True
        return False
@devos50
Copy link
Contributor

devos50 commented Apr 4, 2022

Thanks for this review, very helpful!

As messages are UDP packets, the network can deliver the messages of the same batch in any order; it is incorrect to expect they will be received in the same sequence.

Yes, and I observed a few instances in my experiments where a retransmission was triggered. I assumed this was because one or more messages in the window batch was dropped, but it could also be because of this logic. To reduce the impact of this, I lowered the retransmit timeout.

Not related to the bugs, but I'm a bit unsure how to properly tune the EVA parameters to maximise throughput. There are quite some interdependencies between the parameters, and it would also depend on factors external to EVA such as the UDP packet loss. For example, I can increase the window size (which, assuming a perfect network, would decrease the transfer time) but this also increases the network load and eventually slows down the entire system. Analysing the impact of these dependencies would actually be a very interesting experiment 😁. Also, congestion control would be very helpful here, but I guess this is well beyond the scope of the current implementation (for now, we simply limit the number of simultaneous transfers). Maybe you guys have some ideas on picking the right parameters?

@drew2a
Copy link
Contributor

drew2a commented Apr 4, 2022

Ref: TFTP Windowsize Option: https://datatracker.ietf.org/doc/html/rfc7440#page-4

@drew2a
Copy link
Contributor

drew2a commented Jun 28, 2022

We implemented everything from this issue, except "confirm_callback" which will be implemented in the case of real need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants