Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rust-libp2p hashes pubsub topics which prevents Go/JS interoperability #473

Closed
ghost opened this issue Sep 12, 2018 · 36 comments
Closed

rust-libp2p hashes pubsub topics which prevents Go/JS interoperability #473

ghost opened this issue Sep 12, 2018 · 36 comments
Labels
bug difficulty:easy getting-started Issues that can be tackled if you don't know the internals of libp2p very well

Comments

@ghost
Copy link

ghost commented Sep 12, 2018

I'm trying to demonstrate floodsub interoperability between Go, JS and Rust. In all cases, the topic name is libp2p-demo-chat. Go and JS use that topic string verbatim, but rust-libp2p hashes it to RDEpsjSPrAZF9JCK5REt3tao.

Thoughts? I'd really like to demo the 3 languages inter-operating at some conferences this fall. Guessing you did this for privacy reasons @tomaka?

Example output below. This one is a Go peer connecting to my bootstrapper:

<NOTICE> Got connection from /ip4/127.0.0.1/tcp/46023
handleIncomingRPC: subopt.GetTopicid() = 'libp2p-demo-chat'

Now a JS peer connecting:

<NOTICE> Got connection from /ip4/127.0.0.1/tcp/56098
handleIncomingRPC: subopt.GetTopicid() = 'libp2p-demo-chat'

Now a rust peer connecting:

<NOTICE> Got connection from /ip4/127.0.0.1/tcp/56076
handleIncomingRPC: subopt.GetTopicid() = 'RDEpsjSPrAZF9JCK5REt3tao'
@ghost
Copy link
Author

ghost commented Sep 12, 2018

@vyzo Any thoughts here? Rust is doing the right thing here by hashing*. But Go and JS are both doing the wrong thing in a compatible way. What should we do?

@Stebalien and I talked about asking Rust to add a flag to do the wrong thing (no hashing) while simultaneously upgrading Go and JS to do the right thing (hash the topic) and then all 3 would be both forward and backwards compatible. But getting Go and JS to make a change like at the same time is hard.

(*Assuming the hash is computed over the descriptor and the topic string, which I didn't check.)

@tomaka
Copy link
Member

tomaka commented Sep 13, 2018

Guessing you did this for privacy reasons @tomaka?

I did this out of ignorance (or lack of specifications). I mistakenly guessed that it was supposed to be hashed.

asking Rust to add a flag to do the wrong thing (no hashing)

Sounds good to me, although I'm not sure if it's worth doing so as I don't think anyone uses rust-floodsub at the moment.

@ghost
Copy link
Author

ghost commented Sep 13, 2018

I think you actually did the "right" thing (according to the consensus opinion here in Palo Alto.

Let's see what @vyzo thinks. For now, I can workaround in my demo by giving Go and JS the hashed topic to match rust. (I have all three languages communicating on a pubsub topic now!)

@vyzo
Copy link

vyzo commented Sep 13, 2018

The original intent was for topic descriptors to be CIDs, which would be hashed.
But that's not how it got implemented, we ended up with unhashed topic ids, and it would be a breaking change to suddenly start hashing.
I would recommend that rust-libp2p follows the same approach and not hash the topics (at least for now, and until we have decided that we are going to break stuff and start hashing).

cc @whyrusleeping

@jamesray1
Copy link
Contributor

jamesray1 commented Sep 17, 2018

I'm not sure if it's worth doing so as I don't think anyone uses rust-floodsub at the moment.

I've started developing extending floodsub to gossipsub, with the intention to use it for Ethereum 2.0. So it'd be better to make a change sooner rather than later.

It should be compatible if you use TopicDescriptor.name, and we can still keep the hash code if we do decide to go with hashing. I am putting together a PR for this; although there are a few compiler errors to fix:

[rustc] field `descriptor` of struct `topic::Topic` is private
[rustc] field `name` of struct `rpc_proto::TopicDescriptor` is private
[rustc]
mismatched types

expected struct `std::string::String`, found struct `protobuf::SingularField`

note: expected type `std::string::String`
         found type `protobuf::SingularField<std::string::String>`

rpc_proto::TopicDescriptor is public, as is protobuf::SingularField, should just have to prepend pub on each line. However, you don't want to muck around with the automatically generated rpc_proto.rs, so we need another solution.

Just need to coerce protobuf::SingularField<std::string::String> to std::string::String.

@ghost
Copy link
Author

ghost commented Sep 17, 2018

I'm not sure if it's worth doing so as I don't think anyone uses rust-floodsub at the moment.

I'm using for a demo at web3. WIP here.

Also, we'll need to fix this eventually once we have better interop testing and clear spec for implementors.

@burdges
Copy link

burdges commented Sep 17, 2018

There isn't necessarily much benefit in hashing any sort publicly known identifier all by itself. You could make this an opaque type though:

pub struct PubSubTopic<T: Borrow<[u8]>>(T);
impl<T: Borrow<[u8]>> PubSubTopic<T> {
    /// Unhashed topic visible to everyone, useful for demos, debugging, and certain public channels.
    pub fn public(t: T) -> PubSubTopic<T> { PubSubTopic(t) }
}
impl PubSubTopic<[u8; 16]> {
    /// Produce a private topic by hashing secret information. 
    /// This provides no protection if the topic is widely know, but hashing prevents 
    /// exposing otherwise secret information.
    /// Panics if topic is too short to have 128 bits of entropy. 
    pub fn private(t: &[u8]) -> PubSubTopic<[u8; 16]> {
        assert!(t.len() >= 16);
        // hash it
    }
}

@jamesray1
Copy link
Contributor

It should probably be hashed by default for privacy but optionally able to use the topic name/ID instead.

@vyzo
Copy link

vyzo commented Sep 23, 2018

The privacy argument is compelling, but it's going to be practically hard to change as it would break existing code.

@ghost
Copy link
Author

ghost commented Sep 24, 2018

I'm thinking that we need to have a way to make safe changes in the protocol spec and not be constrained by "this is how go and js implemented it." With the benefit of hindsight, the topic id should probably be a typed value (like a cid) instead of a free-form text string. Something to think about for the Specs 2.0 project.

@tomaka
Copy link
Member

tomaka commented Oct 1, 2018

I've taken a look again, and the reason why the topic is hashed in the Rust impl is because of this line: https://github.com/libp2p/go-floodsub/blob/900095341be1eeba1a0d6c1e08ab61d1242c6cb4/pb/rpc.proto#L48

If there is no hashing going on, I suppose that the whole TopicDescriptor struct is useless?

@jamesray1
Copy link
Contributor

I've taken a look again, and the reason why the topic is hashed in the Rust impl is because of this line: https://github.com/libp2p/go-floodsub/blob/900095341be1eeba1a0d6c1e08ab61d1242c6cb4/pb/rpc.proto#L48

Right, I could've pointed that out, but thanks for doing so! I thought that line was added in the Rust impl but I see now that it is also in the Go impl.

If there is no hashing going on, I suppose that the whole TopicDescriptor struct is useless?

Well it can certainly come in use in future e.g. for encryption and authentication, even if not used now.

The original intent was for topic descriptors to be CIDs, which would be hashed.
But that's not how it got implemented, we ended up with unhashed topic ids, and it would be a breaking change to suddenly start hashing.
I would recommend that rust-libp2p follows the same approach and not hash the topics (at least for now, and until we have decided that we are going to break stuff and start hashing).

Looks like the comment conflicts with this statement so it should be edited for consistency, either that or change implementations to be able to use both the topic.name and hash(topicDescriptor), as suggested by @mgoelzer, or just rewrite all implementations to use a CID.

Looks like it's worthwhile to do the latter option eventually, so in the long-run it would be less development time to just do that, rather than an interim option, then that option later.

@tomaka
Copy link
Member

tomaka commented Oct 2, 2018

Well it can certainly come in use in future e.g. for encryption and authentication, even if not used now.

IMO it's usually a good idea to remove unused code, in order to not confuse readers, like I got confused in this situation for example.

@ghost
Copy link
Author

ghost commented Oct 2, 2018

I've taken a look again, and the reason why the topic is hashed in the Rust impl is because of this line: https://github.com/libp2p/go-floodsub/blob/900095341be1eeba1a0d6c1e08ab61d1242c6cb4/pb/rpc.proto#L48

And my contention is that this code is exactly right per the spec and original intentions (hence my thumbs down).

If there is no hashing going on, I suppose that the whole TopicDescriptor struct is useless?

Yes, but I don't think you should remove correct behavior just because Go and JS are broken. The suggestion (originally from @Stebalien) that I was advocating for above, and that @jamesray1 seems to be agreeing with, is that we support both topic.name and hash(topicDescriptor) with some kind of a flag to choose which you want. It feels uncomfortable to have a flag that basically means "disregard the spec." In a perfect world, we would just fix Go and JS to use the hash(topicDescriptor), but this change would break IPFS because we don't have any way of rolling out an update to all IPFS nodes simultaneously.

I thought that line was added in the Rust impl but I see now that it is also in the Go impl.

@jamesray1 Where do you see it in the Go implementation? The user-level behavior of the Go implementation is definitely not to hash the topic, so if that code is in the Go implementation we must be forgetting to make a function call somewhere.

@vyzo
Copy link

vyzo commented Oct 2, 2018

At this point it is arguable whether it's the spec that broken or the extant implementations.
One might call for revising the spec and calling it day :)

@ghost
Copy link
Author

ghost commented Oct 2, 2018

@vyzo Fair point there. So, maybe @tomaka is right to just remove the entire TopicDescription struct and use plaintext topics everywhere. (It definitely makes for a better demo!)

@jamesray1
Copy link
Contributor

jamesray1 commented Oct 2, 2018

@jamesray1 Where do you see it in the Go implementation? The user-level behavior of the Go implementation is definitely not to hash the topic, so if that code is in the Go implementation we must be forgetting to make a function call somewhere.

Sorry, to clarify, I meant that single comment line in the rpc.proto file that was originally implemented in the Go impl, the same line that @tomaka linked above: https://github.com/libp2p/go-floodsub/blob/900095341be1eeba1a0d6c1e08ab61d1242c6cb4/pb/rpc.proto#L48.

And yes, I agree, it seems that the most practical way given the IPFS upgrade limitation is to update implementations for forward and backwards compatibility, as you suggest with a flag to use topic.name or hash(topicDescriptor), as well as being able to use a future flag for a CID, when that is implemented.

OTOH, removing the topicDescriptor struct until it is actually used and updating the spec also makes sense and is easier, so is probably sensible to do for now.

@jamesray1
Copy link
Contributor

See libp2p/specs#94, "Add a note to say that TopicDescriptor isn't used".

@jamesray1
Copy link
Contributor

jamesray1 commented Oct 16, 2018

If people can approve libp2p/specs#94 that would be great. I need that merged so that I can start making changes to the Rust impl which will be used in gossipsub as well as floodsub.

@tomaka
Copy link
Member

tomaka commented Nov 8, 2018

What about having /floodsub/1.0.0 use the topic name and /floodsub/2.0.0 the topic hash?

@ghost
Copy link
Author

ghost commented Nov 9, 2018 via email

@jamesray1
Copy link
Contributor

FWIW I've partly implemented something similar for message IDs and message hashes with gossipsub at #767.

@mhchia
Copy link

mhchia commented Apr 30, 2019

@mgoelzer @vyzo @tomaka Is there a consensus on whether to change the topic to the topic hash? From the go-libp2p-pubsub, it seems the plaintext is still used on the wire. Just wondering if this change is still under consideration. Thanks!

@vyzo
Copy link

vyzo commented Apr 30, 2019

No consensus has been reached, but will accept a patch in go-libp2p-pubsub that adds hashing as an option.
Otherwise we have an interop problem.

@mhchia
Copy link

mhchia commented Apr 30, 2019

Thank you for the prompt reply!:) Another question: is the usage of topic hash because of the CID intent as you pointed out in #473 (comment), or the privacy intent #473 (comment)?

Another thing I just came up with: if changing to topic hash, the type of topic would probably be changed to bytes(just like cid) in a similar reason as libp2p/go-libp2p-daemon#43. At least in Python, non utf-8 characters are not allowed in string in protobuf.

@vyzo
Copy link

vyzo commented Apr 30, 2019

We probably want the pubsub library doing the hashing, not the user.
But list-topics in the daemon would be affected indeed.

@mhchia
Copy link

mhchia commented Apr 30, 2019

I suspect the topic in pubsub pb messages also needs to be changed as well? Like this one. Otherwise, IIUC projects like py-libp2p would probably fail to initialize that pb message in Python.

@vyzo
Copy link

vyzo commented Apr 30, 2019

Good point. Not sure about wire compatibility of the two though, I think it should be ok; iirc strings and bytes have the same representation on the wire in protobuf.

@mhchia
Copy link

mhchia commented Apr 30, 2019

Yeah, strings seem to be encoded with UTF-8 in serialization. Sorry for being pedantic: I guess the problem is that protobuf requires the string text to be the UTF-8 format[1]. In Python protobuf, string fields are UTF-8 encoded when serialization. They are also checked in the setter and constructor. It's hard to bypass if we don't do hacky things. At least for Python, IIUC we need to change the type to bytes if topic hash is to be supported.

[1]: protobuf language guide, scalar value types, string https://developers.google.com/protocol-buffers/docs/proto3#json
Another quote from other places: https://chromium.googlesource.com/external/github.com/google/protobuf/+/v3.0.0-beta-1/CHANGES.txt#25

@vyzo
Copy link

vyzo commented Apr 30, 2019

So for list-topics we could return the string repr of topic hashes from the daemon, so no change should be necessary.

@mhchia
Copy link

mhchia commented Apr 30, 2019

Sorry if I understand it wrong: And for instance in py-libp2p we can just define the topic field in protobuf as bytes, and pre-serialize it from string repr to bytes, and then fill it directly to the pb message? It sounds working.

@tomaka
Copy link
Member

tomaka commented Apr 30, 2019

For what it's worth, I'd accept a PR that optionally disables the hashing. One has been submitted a long time ago but turned into a zombie PR.

@tomaka tomaka added the getting-started Issues that can be tackled if you don't know the internals of libp2p very well label Apr 30, 2019
@ghost
Copy link
Author

ghost commented Apr 30, 2019

@tomaka Your suggestion seems like the best one.

@vyzo Do you think we should update this paragraph:

The name field is a string used to identify or mark the topic. It can be descriptive or random or anything that the creator chooses.

Note that instead of using TopicDescriptor.name, for privacy reasons the TopicDescriptor struct may be hashed, and used as the topic ID. Another option is to use a CID as a topic ID. While a consensus has not been reached, for forwards and backwards compatibility, using an enum TopicID that allows custom types in variants (i.e. Name, hashedTopicDescriptor, CID) may be the most suitable option if it is available within an implementation's language (otherwise it would be implementation defined).

to make it clear that regardless of what byte string you choose for the topic name, it will be treated as a literal (no-hash)? Then rust-libp2p with the option @tomaka suggests enabled would be conformant, and future implementers won't be confused if they are studying the rust source code,

@vyzo
Copy link

vyzo commented Apr 30, 2019

yeah, that's fine.

@raulk
Copy link
Member

raulk commented Aug 1, 2019

My two cents here.

  • We might want to introduce wildcard subscriptions in the future via (longest) prefix matching (e.g. a peer could subscribe to all topics starting with xyz). This is a popular feature in pubsub systems. Hashing topics completely obstructs that possibility.
  • For a finite domain of topic names (as is usually the case), hashing provides zero security guarantees.
  • For an unbounded domain (e.g. IPNS names), hashing does provide some privacy preservation.

I agree with @burdges that topic names are better left opaque. That way, the application choses its naming scheme based on the tradeoffs above.

peat added a commit to peat/rust-libp2p that referenced this issue Oct 18, 2019
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 16, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 17, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 17, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 17, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 23, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 24, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
rklaehn pushed a commit to Actyx/rust-libp2p that referenced this issue Jan 27, 2020
…o pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.
tomaka pushed a commit that referenced this issue Jan 27, 2020
#1395)

* Addressing #473 ... if I understood the ticket right, we want to pass through whatever the application provides as a topic identifier, leaving hashing (or not hashing) up to the application.

* Remove TopicDescriptor and use Topic newtype everywhere

* PR feedback

Use From<Topic> instead of Into<String>
Use impl Into<Topic> instead of Topic in public API

Co-authored-by: Peat Bakke <peat@peat.org>
@rklaehn
Copy link
Contributor

rklaehn commented Feb 10, 2020

For floodsub, the hashing is now disabled. For gossipsub, the hashing is configurable, with the default being disabled. So I think this can be closed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug difficulty:easy getting-started Issues that can be tackled if you don't know the internals of libp2p very well
Projects
None yet
Development

No branches or pull requests

7 participants