Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can more non-cryptographic hash algorithms be added into .NET BCL? #43131

Open
LeaFrock opened this issue Oct 7, 2020 · 14 comments
Open

Can more non-cryptographic hash algorithms be added into .NET BCL? #43131

LeaFrock opened this issue Oct 7, 2020 · 14 comments

Comments

@LeaFrock
Copy link
Contributor

LeaFrock commented Oct 7, 2020

Original Post

Now the BCL has hash algorithms such as MD5\SHA1\SHA256 etc.. However some other famous hash algorithms like murmur3/fnv/blake2 may take advantages on specific scenes.
I suggest the BCL includes the algorithms so that it can benefit more developers and let them join in the work of optimization. Our community has provided some implementations that can be taken reference, but some of them are still a little bit confusing and unreliable.
For example, Scala's standard library provides the APIs of MurmurHash3. For C#, these algorithms could be put into a stand-alone place and provided by a single Nuget package.

Updated at 2022-11-08

Now we have the System.IO.Hashing package which has already implemented two non-cryptographic hash functions, xxHash and crc32.
And as I expected before, some community projects are quickly interested in them. For example,

https://github.com/avafloww/Thaliak/blob/01eb1e0668b399a70c580f2be1c1480a9699ccda/Thaliak.Common.Database/Models/XivRepository.cs#L26-L28

https://github.com/dotnet/orleans/blob/cf4423ea4d75b4ab8ecf071968a2a4bfd463646d/src/Orleans.Serialization/TypeSystem/TypeCodec.cs#L209

I hope .NET provides more select options of non-cryptographic hash functions which are really beneficial for the ecosystem.

Therefore, I give the following list for reference. If I miss something, please remind me. Thanks!

Hash Function Reason Reference
MurmurHash3 A famous and widely-used hash function. Scala supports it in its standard library.
HighwayHash A faster implementation than SipHash which prevents hash flooding dos attack. https://github.com/google/highwayhash
FNV Is it excellent enough now?
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Security untriaged New issue has not been triaged by the area owner labels Oct 7, 2020
@ghost
Copy link

ghost commented Oct 7, 2020

Tagging subscribers to this area: @bartonjs, @vcsjones, @krwq, @jeffhandley
See info in area-owners.md if you want to be subscribed.

@huoyaoyuan
Copy link
Member

AFAIK, currently BCL relies on underlying OS to provider hash algorithms to be "secure enough". SHA1Managed is fake now and just delegates to native implementation.

There may be a centrally maintained managed hash package for NuGet.

@KalleOlaviNiemitalo
Copy link

KalleOlaviNiemitalo commented Oct 7, 2020

Because BLAKE2 is a cryptographic hash, I guess it would be covered by the policy referred to in #16010 (comment):

We have a policy of not implementing cryptographic primitives, but deferring to OS libraries.

MurmurHash and FNV hash are not cryptographic, so they could be easier to add than BLAKE2. #24328 concerns API design for non-cryptographic hashing.

@krwq
Copy link
Member

krwq commented Oct 8, 2020

Note that framework should be adding only stuff which is useful for everyone and not everything which might be useful to someone, otherwise it would grow very large (and already is relatively large). I think it might be better to create an external library with such algorithm and if you have lots of downloads and prove "useful to many" then you can suggest adding this library into framework. Other thing is that crafting your own implementation of crypto primitives have certain complex process required by law you have to follow so preferred option is to rely on external implementation which already went through such process.

@LeaFrock
Copy link
Contributor Author

LeaFrock commented Oct 8, 2020

@krwq Understand. But now the framework has been divided into different parts, so I think it's feasible to add a stand-alone namespace/library which is provided by a single Nuget package for optional usage(like Microsoft.Bcl.XXX?).

By the way, the current famous hash algorithms, such as MurmurHash, have been many years old and are useful in many important data structures/systems like bloom filter, so it's not bad to implement a reliable .NET API which will really benifit productions of our community.

so preferred option is to rely on external implementation which already went through such process

I have no objection to such an opinion. I believe the problem can be solved by experts of this area(obviously I'm not...).

@MichalStrehovsky
Copy link
Member

I've had to use cryptographic hashes in the past for purposes that didn't require crypto-strong hash. Since .NET didn't offer non-crypto hashes, I used SHA-1. A couple years later I got harassed by the security people at the company who were flagging everything using SHA-1 as a security problem. After a long argument I had to move the code to SHA-256 for no reason (and everyone involved knew it's pointless). I fully expect to be harassed in a couple years again if SHA-256 becomes a security concern.

It would be great if .NET provided a non-crypto hash.

@krwq
Copy link
Member

krwq commented Oct 8, 2020

The OOB package with non-cryptographic hashes is an option to consider. We'd need to figure out what hashes specifically do we want and consider them in terms of why would we want to have them rather than using i.e. built-in Marvin hashing: do we want larger size of hashes? Something faster? Or is perhaps Marvin sufficient and we could consider making it public?

@MichalStrehovsky
Copy link
Member

I would have two things on my wishlist:

  • A compact hash that is reasonably fast (I like Marvin's 64bit and 32bit variants)
  • A hash where collisions are very unlikely (where I can just use the hash as a unique ID of input I can trust, same as e.g. git uses SHA-1 to identify a commit).

This would likely mean two different algorithms.

@Joe4evr
Copy link
Contributor

Joe4evr commented Oct 8, 2020

This would likely mean two different algorithms.

💭 Do you imagine these also to be stable by-default, or is that less of a concern?

@GrabYourPitchforks
Copy link
Member

I retitled + repathed the issue to reflect that we're discussing non-cryptographic hash algorithms.

@GrabYourPitchforks GrabYourPitchforks changed the title Can more hash algorithms be added into .NET BCL? Can more non-cryptographic hash algorithms be added into .NET BCL? Oct 8, 2020
@GrabYourPitchforks
Copy link
Member

TBH I wouldn't add a Marvin-specific public API to any of our shipped packages. It suits our own needs nicely but never really gained traction outside of Microsoft. If we're going to ship implementations of non-crypto hash algorithms, we need to build up a list of the algorithms that would have the greatest benefit to the ecosystem. @LeaFrock's original list provides a good starting point.

@iSazonov
Copy link
Contributor

Great post https://softwareengineering.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

@LeaFrock
Copy link
Contributor Author

LeaFrock commented Jun 9, 2021

I've noticed that the PR #53623 has already added System.IO.Hashing library and provided 4 hash algorithms.

Would it include more algorithms mentioned above at the release of .NET 6?

@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jul 12, 2021
@tannergooding tannergooding added this to the Future milestone Jul 12, 2021
@glen-84
Copy link

glen-84 commented Feb 22, 2024

Why does FNV have a strike through it? (FNV)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests