Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial libp2p standardization #935

Merged
merged 10 commits into from
May 23, 2019
Merged

Add initial libp2p standardization #935

merged 10 commits into from
May 23, 2019

Conversation

AgeManning
Copy link
Contributor

Ahead of multi-client testnets, I think it will be useful to start a discussion on standardizing the libp2p protocols and their respective message formats.

This is a minimal initial draft which outlines some of the protocols, their id's and suggested formats which could be used for eth2.0 clients.

The protocols and formats listed here are currently in use by lighthouse but are entirely malleable and currently exist as suggestions for client implementations.

@JustinDrake
Copy link
Contributor

Should we standardise on peer ids?

@prestonvanloon
Copy link
Contributor

At prysmatic labs, we are using multiple pubsub topics where each topic name is a 1:1 mapping to a message type. An example would be that attestation announcements come across the wire as /eth/serenity/beacon/v1/attestationannounce. Likewise with RPC calls, we have a stream for each RPC method/topic.

Has there been any analysis of the trade-offs for this? Subscribing to only the topics you care about may reduce your network traffic while adding some potential complexity

@prestonvanloon
Copy link
Contributor

Having 1:1 topic mappings also reduces the need for a type nipple. The message is either the expected type or not.

@AgeManning
Copy link
Contributor Author

@JustinDrake - PeerId's seem to be specified here: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/networking/node-identification.md#peer-id-generation
So I have left them out of this particular doc. Perhaps we merge the two.

@AgeManning
Copy link
Contributor Author

AgeManning commented Apr 16, 2019

@prestonvanloon - Agree that potentially we want a topic for each message type (which removes the need for a nibble). In gossipsub for each topic, we have to maintain a set of mesh peers. So there may be some overhead in managing two sets of overlay networks for the blocks/attestations. I guess this is the point of this PR to agree on how we should proceed. I believe the WhiteBlock guys will be doing some testing for gossipsub in the near future and we will be able to have a more informed opinion on which way to proceed.

In regards to having a separate protocol id for requests/responses in the RPC protocol seems like a useful separation as we will not need to encode the different types. However it seems to go against current libp2p protocol designs which typically have a singular protocol id per protocol. I'd opt to go whichever is the most performant.

@JustinDrake
Copy link
Contributor

go-libp2p-crypto contains the canonical implementation of how to hash secp256k1 keys for use as a peer ID

Would it make sense to push for BLS12-381 keys in the longer term, to unify the consensus and networking layers?

@prestonvanloon
Copy link
Contributor

@AgeManning Agreed on the RPC part, it seems sub-optimal to open multiple streams between two peers.

cc: @zscole is this on your radar?

@AgeManning
Copy link
Contributor Author

@JustinDrake - Makes sense for consistency. The caveats might be that BLS will likely take longer to verify (compared to secp256k1) and there is talk of generating new key pairs for every message sent across the pubsub network (to anonymize validators) (I believe BLS generation is comparable to secp256k1 keys so this shouldn't be an issue).

@zscole
Copy link

zscole commented Apr 16, 2019

cc: @zscole is this on your radar?

I don't see why multiple streams would be necessary for anything other than FTP. I don't think it would present any inherent optimization, but this is something we can test for to validate if really necessary.

specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved
specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved

## Identify

#### Protocol Id: `/ipfs/id/1.0.0` (to be updated to `/p2p/id/1.0.0`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is /p2p/ not yet supported? and why not use /eth/ here? Is this something global beyond the eth protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep so, the protocols that come with libp2p have their own protocol IDs. Currently (to my knowledge) these are hard coded and not customisable. I'd have to make a pr to libp2p or rewrite the protocol to change the naming. (For rust-libp2p).
I assume the same for go in order for the protocols to be interoperable.
I'm curious if we want to make the protocol IDs customisable within the libp2p protocols.

Copy link
Contributor

@raulk raulk May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to customise all of these protocol IDs. Doing so will segregate your network from the IPFS or libp2p public networks at the protocol level, such that even if peers happen to cross-connect, they will share no protocols in common, and therefore, won't be able to interact via streams.

Ideally you'll override the protocol ID as low as in multistream-select (to be renamed to multiselect). This is used during connection bootstrapping, and if these IDs don't match, peers will disconnect immediately. This is the logically strictest segregation.

@mslipper
Copy link
Contributor

Looks like I missed this - I'm going to close my PR (#975) in favor of this one. @AgeManning would you mind if I committed changes to this one in order to merge our efforts?

@AgeManning
Copy link
Contributor Author

AgeManning commented Apr 24, 2019

Looks like I missed this - I'm going to close my PR (#975) in favor of this one. @AgeManning would you mind if I committed changes to this one in order to merge our efforts?

@mslipper - Of course. I made this mainly to start the discussion. All edits/commits welcome. :)

specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved
specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved
#### Protocol Id: `/ipfs/id/1.0.0` (to be updated to `/p2p/id/1.0.0`)

The Identify protocol (defined in go - [identify-go](https://github.com/ipfs/go-ipfs/blob/master/core/commands/id.go) and rust [rust-identify](https://github.com/libp2p/rust-libp2p/blob/master/protocols/identify/src/lib.rs))
allows a node A to query another node B which information B knows about A. This also includes the addresses B is listening on.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the relationship to discv5? Is this supposed to be a temporary substitute, a permanent complement, or a replacement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, temporary placeholder. This is currently what lighthouse is using in place of a proper discovery protocol.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an explicit note that this is a placeholder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm working on discv5 at the moment. Currently, I think this protocol may still be useful for NAT traversal and can be fed into the discv5 protocol. Potentially we don't need it at all. I'll add a placeholder note for now.


## Discovery

#### Protocol Id: `/eth/serenity/disc/1.0.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current plan for discv5 is to use an independent transport.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree. This entire document will be updated with discv5 soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was a plan to implement discv5 as a libp2p protocol?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently working on this. My current plan is to mimic mDNS in the integration into libp2p. In this case, there will be no protocol id, but will allow implementations to use the discv5 discovery gadget which listens separately on a UDP port. Messages from this gadget will be available to the user to pass into other libp2p protocols. i.e we discover a node, this discovery can be used to dial a tcp connection or add to a kademlia DHT in libp2p.
Once, I've built an implementation, I will update this document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to have at least the ability to run discv5 over other transports - what would that look like, in this setup?

### Topics

*The Go and Js implementations use string topics - This is likely to be
updated to topic hashes in later versions - https://github.com/libp2p/rust-libp2p/issues/473*
Copy link

@mhchia mhchia Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked about the topic hash in the issue. It seems not finalized yet, but shouldn't be a big issue.

specs/networking/libp2p-standardization.md Show resolved Hide resolved
Copy link
Contributor

@raulk raulk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for undertaking this, @AgeManning! I believe this document creates a lot of clarity for the community.

From my end, I'll publish a short primer on connection bootstrapping and protocol negotiation in https://discuss.libp2p.io, as I believe this is the area that has created the most confusion so far.

specs/networking/libp2p-standardization.md Show resolved Hide resolved
specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved
specs/networking/libp2p-standardization.md Show resolved Hide resolved
specs/networking/libp2p-standardization.md Outdated Show resolved Hide resolved

Libp2p raw gossipsub messages are sent across the wire as fixed-size length-prefixed byte arrays.

The byte array is prefixed with an unsigned 64 bit length number encoded as an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

64-bit varint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. This is based off the current floodsub implementation in rust-libp2p (figure all implementations should be standardized). Data is sent via this function:
https://github.com/libp2p/rust-libp2p/blob/master/core/src/upgrade/transfer.rs#L32-L41
Notice the the len of the data is using the unsigned_varint package with a u64_buffer():
https://github.com/libp2p/rust-libp2p/blob/master/core/src/upgrade/transfer.rs#L45 and
https://github.com/libp2p/rust-libp2p/blob/master/core/src/upgrade/transfer.rs#L46


#### Eth2.0 Specifics

Each message has a maximum size of 512KB (estimated from expected largest uncompressed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth specifying that this is Eth2.0's choice, as gossipsub does not impose a max message length, to my knowledge.

Copy link

@mhchia mhchia May 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking of the maximum size of a message, even though there isn't a specified constant, I played with the max size of DelimitedReader, and now suspect it(1 MB?) is the upper bound of the message size in go implementation, IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Floodsub in rust-libp2p has a maximum size of 2048 bytes:
https://github.com/libp2p/rust-libp2p/blob/master/protocols/floodsub/src/protocol.rs#L60
I've made this configurable in the gossipsub implementation.
The max size exists so that if we see a large amount of data coming through a stream we only read a maximum amount. I understand this to prevent potential DOS vectors from malicious user's sending arbitrary large streams of data.


## Identify

#### Protocol Id: `/ipfs/id/1.0.0` (to be updated to `/p2p/id/1.0.0`)
Copy link
Contributor

@raulk raulk May 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to customise all of these protocol IDs. Doing so will segregate your network from the IPFS or libp2p public networks at the protocol level, such that even if peers happen to cross-connect, they will share no protocols in common, and therefore, won't be able to interact via streams.

Ideally you'll override the protocol ID as low as in multistream-select (to be renamed to multiselect). This is used during connection bootstrapping, and if these IDs don't match, peers will disconnect immediately. This is the logically strictest segregation.

The protocol has two configurable parameters, which can be used to identify the
type of connecting node. Suggested format:
```
version: `/eth/serenity/1.0.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, to my knowledge, this is currently not the case in rust-libp2p. I'll make a PR to make these configurable.


## Discovery

#### Protocol Id: `/eth/serenity/disc/1.0.0`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought there was a plan to implement discv5 as a libp2p protocol?

@raulk
Copy link
Contributor

raulk commented May 2, 2019

On the topic (pun intended) of dedicated topics per message type and shard vs. wildcard topics per shard, I'd suggest the former. A few reasons off the top of my head:

  • It curtails irrelevant gossip amplification. The effort invested by each peer to maintain the mesh overlay is proportional to the traffic it's interested in.
  • Validation functions can be type-specific versus coarse-grained god-functions.
  • Easier debugging and traceability.
  • Allows lighter clients to subscribe only to the data they need.

@wemeetagain wemeetagain mentioned this pull request May 6, 2019
33 tasks
@jrhea
Copy link
Contributor

jrhea commented May 12, 2019

Having 1:1 topic mappings also reduces the need for a type nipple. The message is either the expected type or not.

True. We should make sure there are only two type nipples. ✌️😂🤠

Copy link
Contributor

@djrtwo djrtwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay @AgeManning, looks good to me. Ready to merge?

@AgeManning
Copy link
Contributor Author

@djrtwo - I consider this still a discussion PR and it will be updated in a week or so with discv5. However it might be useful as a guideline to start libp2p co-ordination so happy to merge this WIP PR if you like. I can make future PR's to update it :).

@AgeManning
Copy link
Contributor Author

AgeManning commented May 23, 2019

I've added a small paragraph for Discv5.
We no longer need the identify protocol as Discv5 handles ip and topic discovery for us. It does not maintain it's own protocol-id as it has no need for establishing long-lasting streams. For these reasons I use the word standalone even though it could be built from libp2p-components. It's standalone from a multistream-select perspective.

@fjl
Copy link

fjl commented May 23, 2019

This is really cool. Why should clients optionally support yamux?

@djrtwo djrtwo merged commit 72e1267 into ethereum:dev May 23, 2019
@djrtwo
Copy link
Contributor

djrtwo commented May 23, 2019

Merging.
We can take subsequent convos to issues/PRs

@AgeManning
Copy link
Contributor Author

@fjl - I originally made this as there was no discussion on what clients should use. By default, in rust, we can support both simultaneously. I've not done any testing or benchmarking between either protocol, but my feel is that mplex is minimal and yamux is more complex and potentially used more in prod (I have no real knowledge here however). I opted for the minimal by default and left the other libp2p-default multiplexer as optional. This was in part to spark discussion with people who know more about this than me (which didn't happen).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.