Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor p2p reader #3433

Merged
merged 8 commits into from
Sep 28, 2020
Merged

Refactor p2p reader #3433

merged 8 commits into from
Sep 28, 2020

Conversation

jaspervdm
Copy link
Contributor

This PR refactors the reader of the node p2p by introducing a codec that keeps track of the read state and emits the received p2p messages. The codec only interprets the message headers, the body stay as raw Bytes, leaving it to the Protocol struct to actually interpret them, as is the case now.

Some advantages of the refactored code:

  • Clearer seperation between layers, we don't have to pass the tcp stream around between the reader and the part that actually handles the messages and interfaces with the chain (the Protocol struct)
  • Preparation for a follow up PR: instead of each peer having a handle to the chain and updating the chain in place, introduce a dedicated thread to handle messages and a channel in between it and the peer readers. This has multiple advantages, one of them being we have less places to lock the chain, since updating the chain state is inherently sequential anyway.
  • Work towards async p2p: the codec will implement the tokio_util::codec::Framed trait, which simplifies reading and writing of async messages (future PR)
  • Minimize allocations by re-using the same buffer

Unfortunately afaik we currently do not have a framework to test each p2p message and interaction individually, so this PR has been tested on mainnet by syncing from scratch and keeping it running for a while.

@jaspervdm jaspervdm requested a review from antiochp September 7, 2020 07:35
p2p/src/codec.rs Outdated Show resolved Hide resolved
@antiochp
Copy link
Member

antiochp commented Sep 7, 2020

This looks great! 👍 on the overall approach. This makes a ton of sense.
I need to take some time to review and do some testing though.

@antiochp
Copy link
Member

antiochp commented Sep 7, 2020

Ran a full sync locally and everything went smoothly. 👍

@antiochp
Copy link
Member

antiochp commented Sep 7, 2020

Just putting these here for my own reference as I wrap my head around this.
We have a lot of enums here (both existing and new).

pub enum MsgHeaderWrapper {
	/// A "known" msg type with deserialized msg header.
	Known(MsgHeader),
	/// An unknown msg type with corresponding msg size in bytes.
	Unknown(u64, u8),
}
pub enum Consume<'a> {
	Message(&'a MsgHeader, BufReader<'a, Bytes>),
	Attachment(&'a AttachmentUpdate),
}
pub enum Consumed {
	Response(Msg),
	Attachment(AttachmentMeta, File),
	None,
	Disconnect,
}
pub enum Output {
	Known(MsgHeader, Bytes),
	Unknown(u64, u8),
	Attachment(AttachmentUpdate, Bytes),
}

@jaspervdm
Copy link
Contributor Author

jaspervdm commented Sep 7, 2020

Yes, Consume and Output are only subtly different. I'm trying to think if we can merge them somehow but it might make things a bit awkward.
And I think with some additional refactoring we can get rid of the MsgHeaderWrapper, as it is now only used during the handshake. I will have a look and update the PR.

Edit: sorry I was confused on that last part, I think it would be easier to keep that one around.

@quentinlesceller
Copy link
Member

Could you merge upstream/master into your PR?
Had a quick look and everything looks fine.
Also ran it from existing db and sync from scratch without any issues.

@jaspervdm
Copy link
Contributor Author

jaspervdm commented Sep 10, 2020

@quentinlesceller Yep will do! Thanks for testing.
@antiochp One other thing I was considering is this: should we even deal with deserializing in the Protocol handler? this part is concerned with reading/writing to the chain so maybe we should do the actual deserialization closer to the reader. It would mean we add variants to the Output (renamed to Message) enum, we would have one variant for each type of message

@antiochp
Copy link
Member

we would have one variant for each type of message

So one for each known msg type and one unknown case?
I think that makes a lot of sense - might need to see what it looks like in practice to really know but 👍 on the idea.

@jaspervdm
Copy link
Contributor Author

jaspervdm commented Sep 11, 2020

So one for each known msg type and one unknown case?

Yes, that is basically what I have done in the latest commit. It also allowed us to get rid of one of the enums.
We can also further simplify the code by getting rid of the Type and MsgHeaderWrapper enums and using a macro for decode_message but I think I will leave that to a future PR

Also Message::Attachment(AttachmentUpdate, Option<Bytes>) is a little bit awkward but once we have a dedicated thread to process these messages we can get rid of the Option.

@antiochp
Copy link
Member

Testing locally. Looks good so far.

core/src/core/block.rs Outdated Show resolved Hide resolved
@antiochp
Copy link
Member

I'm planning to cut a 4.1.0 off master in the next day or so. This is so we have an official release with "commit only inputs".

Do you mind if we hold off on merging this PR until after that?

@antiochp
Copy link
Member

Actually I'm guessing we want to preserve the "streaming read" behavior for chunks of headers.
This is likely to be useful generally once we have various PIBD related "large" messages (chunks of kernel MMR etc.) We are going to want to be able to start processing chunks of data before we have fully received the msg in these cases.

@jaspervdm
Copy link
Contributor Author

Do you mind if we hold off on merging this PR until after that?

Sure, no problem.

We now (I think?) read and parse all headers up front and then process them in chunks of 32.

Yes, correct.

Actually I'm guessing we want to preserve the "streaming read" behavior for chunks of headers.

I agree. I will look into this. Since the codec is already holding state and we are decoding the messages in there, this shouldn't be too hard to do.

@antiochp
Copy link
Member

We know the overall size in bytes of the full 512 header payload.
But we do not know the size of each individual header.

The streaming reader allowed us to call read_item() repeatedly for each individual header, parsing them as soon as we had enough bytes to read.

Looking at this now it is conceptually close to an Iterator, just not defined in those terms.

You mention Framed here in the PR for async processing.
https://docs.rs/tokio-util/0.3.1/tokio_util/codec/struct.Framed.html

The frame here is something at a smaller granularity than the full msg body. A chunk of n headers in this specific case.

@jaspervdm
Copy link
Contributor Author

@antiochp Ok with the latest commit we now read the headers as they come in, and return them in batches of 32. This is done by overestimating the header size and performing deserialization when the buffer is filled. I also added some tests to prove that the header size calculation is actually correct.

There was a nasty bug in there that took me a long time to hunt down, but I managed to fix it. The problem was that if a read failed, the reserved part of the buffer was kept around. This messed up the next read attempt.

return Ok(Message::Headers(h));
}
}
Attachment(left, meta, now) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And in contrast with headers, reading an attachment returns a single Message, basically a running total of the download? Each time with potentially more of the attachment read?
Subsequent blocking calls to read_inner() may result in more of the attachment being read.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An attachment will generate a lot of Message::Attachments, one for every 48000 bytes that is read. This is the same size iirc as the buffer we were filling before.

Copy link
Member

@antiochp antiochp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Let's do it!

@antiochp antiochp merged commit defc714 into mimblewimble:master Sep 28, 2020
@antiochp antiochp mentioned this pull request Nov 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants