Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

murmur3 support #150

Merged
merged 5 commits into from
Oct 28, 2021
Merged

murmur3 support #150

merged 5 commits into from
Oct 28, 2021

Conversation

willscott
Copy link
Contributor

No description provided.

@aschmahmann
Copy link
Contributor

murmur3 was removed until someone had time to document and get their story straight about what the murmur3 codec actually means. It's a bit of a mess (#135 (comment)).

I would like to not merge this in until we have the definition of what is meant by murmur3 specced out. Adding test vectors here like we have for some of the other hash functions would be really appreciated.

@willscott
Copy link
Contributor Author

the downside of not having something here is that it's going to surround all the use cases with ugliness instead

e.g. it's already there: https://github.com/ipfs/go-unixfsnode/blob/main/hamt/util.go#L108-L112

)

func init() {
multihash.Register(multihash.MURMUR3_128, func() hash.Hash { h := murmur3.New128(); return h })
Copy link
Contributor

@warpfork warpfork Oct 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the goal is to make sure we have a description of what unixfsv1 does internally, per A and B, it appears the function desired is New64, not New128.

Of course, that's not what multihash indicator code 0x22 is stated to be in the multiformat table today. 🤷

@warpfork
Copy link
Contributor

warpfork commented Oct 14, 2021

We have the choice between one of several different kinds of ugliness. There is no non-ugly pick as far as I can tell :)

There's a couple different pieces of history that are contradictory. If we want to stop avoiding the problem, we have to pick one of them to break.

  • We could keep 0x22 and keep the statement that it's "murmur3-128", and add a comment to the table that "this is a lie, actually; it's really murmur3-64, sorry": the result is everyone is confused forever, but on the plus side, nothing changes.
  • We could add a new code, and state it as "murmur3-64", and move on that way. Confusions would be fixed going forward... but, if 0x22 is already encoded in unixfsv1 data, this would be nonviable / doesn't actually solve the problem -- there's way too much data inertia there; we can't snap our fingers and rewrite every unixfsv1 document to have a new number in it, so we'd still be stuck with 0x22 being confusing forever.
  • We could keep 0x22 and change the text name to be "murmur3-64" in the multicodec table, and change any code constants with that name as well. This might break some code that refers to the old name, but wouldn't change any serial data in the wild, and would fix the confusion.

I think we should probably do the third, there. The indicator codes have a lot more inertia than the textual names do, because they appear in data in the wild. The code names have very minimal inertia by comparison. So I propose we "break" the name and change it to a correct one in order to fix this mess.

EDIT: ah, I didn't see another discussion reach the same conclusion, but it appears it has in multiformats/multicodec#236. Super good. We're all in agreement then, sounds like.

@willscott
Copy link
Contributor Author

This is updated to naming chosen upstream. Please take a look

Copy link
Contributor

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small nits but otherwise LGTM.

}

// A wrapper is needed to export the correct size, because murmur3 incorrectly advertises Hash64 as a 128bit hash.
type murmur64 struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe this type name should be as verbose as the constants became, for consistency?

Comment on lines +30 to +31
func (murmur64) BlockSize() int {
return 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks slightly funny. I can't really fathom any harm to it, though.

... but I went on a splunk anyway, and I'd say 16 is probably an accurate number. citation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh. Remains weird to me that they said it that way, but okay.

@willscott willscott merged commit 3f6167c into master Oct 28, 2021
@willscott willscott deleted the murmur branch October 28, 2021 22:27
@aschmahmann aschmahmann mentioned this pull request Dec 1, 2021
80 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants