Refactor p2p reader #3433

jaspervdm · 2020-09-07T07:35:27Z

This PR refactors the reader of the node p2p by introducing a codec that keeps track of the read state and emits the received p2p messages. The codec only interprets the message headers, the body stay as raw Bytes, leaving it to the Protocol struct to actually interpret them, as is the case now.

Some advantages of the refactored code:

Clearer seperation between layers, we don't have to pass the tcp stream around between the reader and the part that actually handles the messages and interfaces with the chain (the Protocol struct)
Preparation for a follow up PR: instead of each peer having a handle to the chain and updating the chain in place, introduce a dedicated thread to handle messages and a channel in between it and the peer readers. This has multiple advantages, one of them being we have less places to lock the chain, since updating the chain state is inherently sequential anyway.
Work towards async p2p: the codec will implement the tokio_util::codec::Framed trait, which simplifies reading and writing of async messages (future PR)
Minimize allocations by re-using the same buffer

Unfortunately afaik we currently do not have a framework to test each p2p message and interaction individually, so this PR has been tested on mainnet by syncing from scratch and keeping it running for a while.

p2p/src/codec.rs

antiochp · 2020-09-07T09:17:48Z

This looks great! 👍 on the overall approach. This makes a ton of sense.
I need to take some time to review and do some testing though.

antiochp · 2020-09-07T12:01:46Z

Ran a full sync locally and everything went smoothly. 👍

p2p/src/protocol.rs

core/src/ser.rs

antiochp · 2020-09-07T14:05:13Z

Just putting these here for my own reference as I wrap my head around this.
We have a lot of enums here (both existing and new).

pub enum MsgHeaderWrapper {
	/// A "known" msg type with deserialized msg header.
	Known(MsgHeader),
	/// An unknown msg type with corresponding msg size in bytes.
	Unknown(u64, u8),
}

pub enum Consume<'a> {
	Message(&'a MsgHeader, BufReader<'a, Bytes>),
	Attachment(&'a AttachmentUpdate),
}

pub enum Consumed {
	Response(Msg),
	Attachment(AttachmentMeta, File),
	None,
	Disconnect,
}

pub enum Output {
	Known(MsgHeader, Bytes),
	Unknown(u64, u8),
	Attachment(AttachmentUpdate, Bytes),
}

jaspervdm · 2020-09-07T15:09:20Z

Yes, Consume and Output are only subtly different. I'm trying to think if we can merge them somehow but it might make things a bit awkward.
~~And I think with some additional refactoring we can get rid of the MsgHeaderWrapper, as it is now only used during the handshake.~~ I will have a look and update the PR.

Edit: sorry I was confused on that last part, I think it would be easier to keep that one around.

quentinlesceller · 2020-09-09T16:39:57Z

Could you merge upstream/master into your PR?
Had a quick look and everything looks fine.
Also ran it from existing db and sync from scratch without any issues.

jaspervdm · 2020-09-10T14:55:10Z

@quentinlesceller Yep will do! Thanks for testing.
@antiochp One other thing I was considering is this: should we even deal with deserializing in the Protocol handler? this part is concerned with reading/writing to the chain so maybe we should do the actual deserialization closer to the reader. It would mean we add variants to the Output (renamed to Message) enum, we would have one variant for each type of message

antiochp · 2020-09-10T17:54:27Z

we would have one variant for each type of message

So one for each known msg type and one unknown case?
I think that makes a lot of sense - might need to see what it looks like in practice to really know but 👍 on the idea.

jaspervdm · 2020-09-11T15:37:53Z

So one for each known msg type and one unknown case?

Yes, that is basically what I have done in the latest commit. It also allowed us to get rid of one of the enums.
We can also further simplify the code by getting rid of the Type and MsgHeaderWrapper enums and using a macro for decode_message but I think I will leave that to a future PR

Also Message::Attachment(AttachmentUpdate, Option<Bytes>) is a little bit awkward but once we have a dedicated thread to process these messages we can get rid of the Option.

antiochp · 2020-09-14T13:28:02Z

Testing locally. Looks good so far.

core/src/core/block.rs

antiochp · 2020-09-14T16:50:25Z

I'm planning to cut a 4.1.0 off master in the next day or so. This is so we have an official release with "commit only inputs".

Do you mind if we hold off on merging this PR until after that?

p2p/src/protocol.rs

antiochp · 2020-09-14T17:43:00Z

Actually I'm guessing we want to preserve the "streaming read" behavior for chunks of headers.
This is likely to be useful generally once we have various PIBD related "large" messages (chunks of kernel MMR etc.) We are going to want to be able to start processing chunks of data before we have fully received the msg in these cases.

jaspervdm · 2020-09-14T20:06:56Z

Do you mind if we hold off on merging this PR until after that?

Sure, no problem.

We now (I think?) read and parse all headers up front and then process them in chunks of 32.

Yes, correct.

Actually I'm guessing we want to preserve the "streaming read" behavior for chunks of headers.

I agree. I will look into this. Since the codec is already holding state and we are decoding the messages in there, this shouldn't be too hard to do.

antiochp · 2020-09-15T09:49:28Z

We know the overall size in bytes of the full 512 header payload.
But we do not know the size of each individual header.

The streaming reader allowed us to call read_item() repeatedly for each individual header, parsing them as soon as we had enough bytes to read.

Looking at this now it is conceptually close to an Iterator, just not defined in those terms.

You mention Framed here in the PR for async processing.
https://docs.rs/tokio-util/0.3.1/tokio_util/codec/struct.Framed.html

The frame here is something at a smaller granularity than the full msg body. A chunk of n headers in this specific case.

jaspervdm · 2020-09-16T13:11:44Z

@antiochp Ok with the latest commit we now read the headers as they come in, and return them in batches of 32. This is done by overestimating the header size and performing deserialization when the buffer is filled. I also added some tests to prove that the header size calculation is actually correct.

There was a nasty bug in there that took me a long time to hunt down, but I managed to fix it. The problem was that if a read failed, the reserved part of the buffer was kept around. This messed up the next read attempt.

p2p/src/codec.rs

antiochp · 2020-09-17T14:38:45Z

p2p/src/codec.rs

+						return Ok(Message::Headers(h));
+					}
+				}
+				Attachment(left, meta, now) => {


And in contrast with headers, reading an attachment returns a single Message, basically a running total of the download? Each time with potentially more of the attachment read?
Subsequent blocking calls to read_inner() may result in more of the attachment being read.

An attachment will generate a lot of Message::Attachments, one for every 48000 bytes that is read. This is the same size iirc as the buffer we were filling before.

p2p/src/codec.rs

antiochp

👍

Let's do it!

Refactor p2p reader

6746e04

jaspervdm requested a review from antiochp September 7, 2020 07:35

antiochp reviewed Sep 7, 2020

View reviewed changes

p2p/src/codec.rs Outdated Show resolved Hide resolved

antiochp reviewed Sep 7, 2020

View reviewed changes

p2p/src/protocol.rs Show resolved Hide resolved

antiochp reviewed Sep 7, 2020

View reviewed changes

core/src/ser.rs Show resolved Hide resolved

jaspervdm added 2 commits September 8, 2020 16:08

Rename Output enum

14d1e01

Consume enum takes owned values instead of references

57524d4

Deserialization in codec, remove Consume enum

5d44070

Merge remote-tracking branch 'upstream/master' into refactor_read

35c2f13

antiochp reviewed Sep 14, 2020

View reviewed changes

core/src/core/block.rs Outdated Show resolved Hide resolved

antiochp reviewed Sep 14, 2020

View reviewed changes

p2p/src/protocol.rs Show resolved Hide resolved

jaspervdm added 2 commits September 15, 2020 15:19

Calculate block header size

fa8bfd1

Read headers in batches

f1c2017

antiochp reviewed Sep 17, 2020

View reviewed changes

p2p/src/codec.rs Show resolved Hide resolved

antiochp reviewed Sep 17, 2020

View reviewed changes

p2p/src/codec.rs Show resolved Hide resolved

antiochp reviewed Sep 17, 2020

View reviewed changes

p2p/src/codec.rs Show resolved Hide resolved

antiochp reviewed Sep 17, 2020

View reviewed changes

p2p/src/codec.rs Show resolved Hide resolved

Remove headers type from deserializer

f1661ae

antiochp approved these changes Sep 22, 2020

View reviewed changes

antiochp merged commit defc714 into mimblewimble:master Sep 28, 2020

antiochp mentioned this pull request Nov 26, 2020

v5.0.0 Release Notes #3506

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor p2p reader #3433

Refactor p2p reader #3433

jaspervdm commented Sep 7, 2020

antiochp commented Sep 7, 2020

antiochp commented Sep 7, 2020

antiochp commented Sep 7, 2020

jaspervdm commented Sep 7, 2020 •

edited

Loading

quentinlesceller commented Sep 9, 2020

jaspervdm commented Sep 10, 2020 •

edited

Loading

antiochp commented Sep 10, 2020

jaspervdm commented Sep 11, 2020 •

edited

Loading

antiochp commented Sep 14, 2020

antiochp commented Sep 14, 2020

antiochp commented Sep 14, 2020

jaspervdm commented Sep 14, 2020

antiochp commented Sep 15, 2020

jaspervdm commented Sep 16, 2020

antiochp Sep 17, 2020

jaspervdm Sep 18, 2020

antiochp left a comment

Refactor p2p reader #3433

Refactor p2p reader #3433

Conversation

jaspervdm commented Sep 7, 2020

antiochp commented Sep 7, 2020

antiochp commented Sep 7, 2020

antiochp commented Sep 7, 2020

jaspervdm commented Sep 7, 2020 • edited Loading

quentinlesceller commented Sep 9, 2020

jaspervdm commented Sep 10, 2020 • edited Loading

antiochp commented Sep 10, 2020

jaspervdm commented Sep 11, 2020 • edited Loading

antiochp commented Sep 14, 2020

antiochp commented Sep 14, 2020

antiochp commented Sep 14, 2020

jaspervdm commented Sep 14, 2020

antiochp commented Sep 15, 2020

jaspervdm commented Sep 16, 2020

antiochp Sep 17, 2020

Choose a reason for hiding this comment

jaspervdm Sep 18, 2020

Choose a reason for hiding this comment

antiochp left a comment

Choose a reason for hiding this comment

jaspervdm commented Sep 7, 2020 •

edited

Loading

jaspervdm commented Sep 10, 2020 •

edited

Loading

jaspervdm commented Sep 11, 2020 •

edited

Loading