Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sha2-256-trunc2 multihash: 0x1012 #170

Merged
merged 1 commit into from
May 12, 2020
Merged

add sha2-256-trunc2 multihash: 0x1012 #170

merged 1 commit into from
May 12, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Apr 16, 2020

SHA2-256 with the trailing 2 bits zeroed out. Primary current use is Filecoin.

It's only "truncated" in the sense that the output is truncated, but there are the same number of output bits.

0x1012 was chosen to mirror the 0x12 sha2-256 entry and it's close to some other, less common, hash functions.

Current implementations:

Ref: #161

@rvagg
Copy link
Member Author

rvagg commented Apr 16, 2020

table.csv Outdated
@@ -103,6 +103,7 @@ http, multiaddr, 0x01e0,
json, serialization, 0x0200, JSON (UTF-8-encoded)
messagepack, serialization, 0x0201, MessagePack
libp2p-peer-record, libp2p, 0x0301, libp2p peer record type
sha2-256-trunc2, multihash, 0x1012, SHA2-256 truncated with trailing 2 bits replaced with zeros - used for proving trees as in Filecoin

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'trailing 2 bits' is slightly ambiguous. This specifically means the two most significant bits of the last byte. Or, said differently, the two most significant bits when the value is treated as a little-endian unsigned integer. The latter also explains the semantic requirement: we are 'truncating' in order to ensure conversion to a field element results in a value representable in 254 bits.

I'm expanding only in case that helps choose clearer descriptive text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, that's pretty critical to get right, endianness mistakes are cause for much pain, will ammend

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with this, although it's rather wordy, but it's precise:

SHA2-256 truncated with 2 most significant bits of the last byte (interpreted as a little-endian uint) replaced with zeros - used for proving trees as in Filecoin

Copy link

@ribasushi ribasushi Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late to the party, but the current iteration still strikes me as confusing. How about:

0x1012,         SHA2-256 with bit 249 and 250 (7th and 8th bit counted from the end) both forced to 0. Used for proving trees as in Filecoin

( if I got this wrong - even more indication that the current description is confusing )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit counting and "from the end" might be a bit ambiguous here, I find it way too tempting to think of bits laid out sequentially like that but it's not usually how we deal with them when they're in byte form. What you want an implementation to do is & 0b00111111 (& 0x3f). Does that look like "7th and 8th bit counted from the end"? It looks more like the "2 most significant bits" to me, but it also needs the LE qualifier because in BE they might programatically look more like "from the end".

🤷

All this to say, I'm not sure your wording clears it up any more than the current wording. Maybe it needs & 0x3f in there for good measure.

Copy link

@ribasushi ribasushi Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went down the bit-counting path because the discussion of BE/LE confuses things for me, since these are concepts strictly for ordering within integer types. That is, endianness is well defined/understood for 16/32/64 bit integers. A 256+ bit hash is almost always* a bytestream, so speaking of endianness is.. odd. I suppose just saying:

0x1012,         SHA2-256 with the last byte having two of its bits cleared via `& 0b0011_1111`. Used for proving trees as in Filecoin

*when you do base-256 conversions e.g. to btc58 then you are working with an abstract uint256, but that doesn't apply here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better now! No further questions ;)

@rvagg
Copy link
Member Author

rvagg commented Apr 16, 2020

Anyone want to weigh on on the name? One alternative might be to flip it around to be sha2-256-trunc254 since what we're caring about here is the 254 bits, even if the output is 256 bits.

@vmx
Copy link
Member

vmx commented Apr 16, 2020

I'm not familiar with crypto lingo, so "trunc" might be the spot on word.

Though I think of truncation, I think of something being cut-off, hence shortened. The hash is, but the value stored is not. Would perhaps something like sha2-256-zeroed254 or sha2-256-trunc254-padded make sense?

I don't want to bikeshed this too much, so feel free to ignore :)

@rvagg
Copy link
Member Author

rvagg commented Apr 17, 2020

Yeah, I'm not totally comfortable with the word "truncated" either as it doesn't seem to quite describe what's going on. In terms of practical usage I think it is truncated, they're only using the 254 bits I think. But in terms of what this multihash would be describing, it's the truncated + 2 zeros. sha2-256-trunc254-padded might be more accurate, but it's getting very wordy! Maybe that's OK though, considering what I've done with Poseidon...

@rvagg
Copy link
Member Author

rvagg commented Apr 23, 2020

have gone with sha2-256-trunc254-padded, will see what others say

@rvagg
Copy link
Member Author

rvagg commented Apr 30, 2020

Thanks to @ribasushi for pinging on the clarity of the note, which was:

SHA2-256 truncated with 2 most significant bits of the last byte (interpreted as a little-endian uint) replaced with zeros - used for proving trees as in Filecoin

Thanks to some input from @davidad I've gone with:

SHA2-256 with the two most significant bits from the last byte zeroed (as via a mask with 0b00111111) - used for proving trees as in Filecoin

Any objections to that?

SHA2-256 with the trailing 2 bits zeroed out. Primary current use is
Filecoin.

Ref: #161
@rvagg rvagg merged commit 0aa8f5d into multiformats:master May 12, 2020
@rvagg rvagg deleted the rvagg/sha2-256-trunc2 branch May 12, 2020 06:04
rvagg added a commit to rvagg/go-multihash that referenced this pull request May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants