Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Networking] Gossipsub content-addressed messages #1528

Closed
AgeManning opened this issue Dec 16, 2019 · 9 comments
Closed

[Networking] Gossipsub content-addressed messages #1528

AgeManning opened this issue Dec 16, 2019 · 9 comments

Comments

@AgeManning
Copy link
Contributor

AgeManning commented Dec 16, 2019

Currently, gossipsub uses source_peer_id + sequence_number to address the messages sent across the gossipsub network. If a client re-publishes a seen message, this will look like a new message to all other peers. This can lead to duplicate messages on the network, which then needs to be filtered at the application layer.

Rust and go (libp2p/go-libp2p-pubsub#248) now have the ability to customise the message id of gossipsub messages. I propose we set the gossipsub message id to:
base64(sha256(data)) where data is the gossipsub protobuf data field which typically contains our ssz-encoded data or snappy-compressed data.

This way, gossipsub will filter out duplicate messages before notifying the application layer. In principle, we then just need to verify the hash at the application layer to ensure duplicates aren't sent/received.

@Nashatyrev
Copy link
Member

@AgeManning

If a client re-publishes a seen message,

Is it considered to be malicious client behavior? Or you mean some real use case?

If this is to protect from malicious clients then I don't understand why it's a problem to filter the stuff on the application layer? The duplicates would be filtered on app layer and would not be propagated. The same way as any other invalid payload (e.g. invalid validator signature or whatever) would be filtered on the app layer and not propagated to the network

@AgeManning
Copy link
Contributor Author

AgeManning commented Dec 17, 2019

As a real use case, Prysm for all gossip messages are re-publishing them, which changes the source id and sequence number making old messages look like new ones. As I understand, this is because the API to validate messages in go doesn't simply provide a method for validation then re-propagation.

Currently there is nothing in the specification that prevents clients from re-publishing seen messages, so I don't think this is strictly malicious behaviour.

For our use, we don't particularly care about the source of the message, mainly just it's contents, so it makes sense (at least to me) to content-address messages to prevent duplication at the gossip layer.

If a client fails to filter at the application layer (for whatever reason) and re-propagates a message to a client that re-publishes the message, I think it's possible to get into propagation loops of the same message which spams the gossip channel. This would be one extra step to prevent such scenarios (even though I agree this could be solved at the application layer).

@AgeManning
Copy link
Contributor Author

As a fun side-effect, messages will be a few bytes smaller also :)

@Nashatyrev
Copy link
Member

As I understand, this is because the API to validate messages in go doesn't simply provide a method for validation then re-propagation.

Oh, this sounds rather like a Go implementation issue.
IMHO validating and re-publishing doesn't look like the way the Gossip is intended to be used :(

Otherwise content-addressing looks ok but we may need to consider hashing computation overhead and possible related DoS attacks

@vyzo
Copy link

vyzo commented Dec 17, 2019

This isn't true, the go implementation has validators; messages are only propagated if they pass validation.

@vyzo
Copy link

vyzo commented Dec 17, 2019

Note that the validators don't have to return at once, they can take a while to complete and that's fine.

@djrtwo
Copy link
Contributor

djrtwo commented Dec 17, 2019

I support this and the base64(sha256(data)) looks good. Want to put together a PR @AgeManning?

In the general case, it seems that libp2p pubsub should handle this for us as long as we don't re-publish messages after validation. That said, there are cases in which messages might very well be published by two independent nodes and in that case, it would be advantageous to see them as the same message within pubsub.

The example I'm thinking of is one ValidatorClient (VC) connected to multiple BeaconNodes (BN). In this type of setup a VC might simultaneously publish signed messages to more than one BN to add some redundancy and potentially some speed gains in propagation.

It should be noted that we still need application layer validation of non-duplicates for the AggregateAndProof messages on beacon_aggregate_and_proof pubsub topic. Two "aggregators" might compute the same aggregate attestation but have a different "proof" signature in the wrapper container. A naive base64(sha256(data)) would not catch the duplicate attestation in this case. This validation is already in the networking spec and should remain there.

@vyzo
Copy link

vyzo commented Dec 17, 2019

If you are using the hash as the message ID, then they won't get republished (as long as they are within the time cache window).

@djrtwo
Copy link
Contributor

djrtwo commented Jan 3, 2020

closed via #1538

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants