Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-cryptographic hash algorithms #53623

Merged
merged 18 commits into from
Jun 9, 2021
Merged

Conversation

bartonjs
Copy link
Member

@bartonjs bartonjs commented Jun 2, 2021

  • Adds a new package, System.IO.Hashing (netstandard2.0, and net461 to reduce netfx restore graph sizes)
  • Provides implementations for four hash algorithms
    • CRC-32 (the variation used by IEEE 802.3 (Ethernet))
    • CRC-64 (the variation used by ECMA-182)
    • XxHash32, with optional seed
    • XxHash64, with optional seed
  • The tests are structured so that algorithm tests just need to provide their test vectors and do some boilerplate overrides to get variation testing across the instance methods and static methods.

Fixes #24328.

@bartonjs bartonjs added this to the 6.0.0 milestone Jun 2, 2021
@bartonjs bartonjs self-assigned this Jun 2, 2021
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Jun 2, 2021

Tagging subscribers to this area: @tannergooding, @pgovind
See info in area-owners.md if you want to be subscribed.

Issue Details
  • Adds a new package, System.IO.Hashing (netstandard2.0, and net461 to reduce netfx restore graph sizes)
  • Provides implementations for four hash algorithms
    • CRC-32 (the variation used by IEEE 802.3 (Ethernet))
    • CRC-64 (the variation used by ECMA-182)
    • XxHash32, with optional seed
    • XxHash64, with optional seed
  • The tests are structured so that algorithm tests just need to provide their test vectors and do some boilerplate overrides to get variation testing across the instance methods and static methods.

Fixes #24328.

Author: bartonjs
Assignees: bartonjs
Labels:

area-System.Numerics, new-api-needs-documentation

Milestone: 6.0.0

byte idx = (byte)crc;
idx ^= source[i];
crc = s_crcLookup[idx] ^ (crc >> 8);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we be able to use any of the hardware intrinsics to speed this up if we had a .NET 6+ build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to this. If we make this an inbox binary (which I think our partners would really want us to do), it'd make sense to intrinsicify these code paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we make this an inbox binary (which I think our partners would really want us to do)

If I understand correctly, our thoughts for this library is "pure OOB".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would the lib need to be inbox to benefit from hardware intrinsics? Doesn't it just need to compile against net6.0 which and could be included in the package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only needs to compile against at least 3.0 to get acceleration on x86/x64 and 5.0 for acceleration on ARM64.

In particular, it would be nice if we just put the CRC32C algorithm under: #2036 (which is approved just NYI) and had this API forward to it.
You can then also accelerate plain CRC32 on Arm64, but that requires explicit intrinsics usage outside an xplat helper method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, but those are CRC32C, not CRC32. If there's a sensible way to get from CRC32C to CRC32, then sure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regular CRC32 (that is the variant where 0x04C11DB7 is the polynomial) its only available to accelerate on ARM64 (at least via a direct instruction, although technically also for Arm32, we just don't have intrinsic support here).

For CRC32-C (that is the variant where 0x1EDC6F41 is the polynomial) its available to x86/x64 and ARM64.

@bartonjs bartonjs mentioned this pull request Jun 3, 2021
/// </summary>
/// <remarks>
/// <para>
/// This implementation emits the answer in the Little Endian byte order so that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary here, but I wonder if our code sample docs should show using BinaryPrimitives.ReadInt32LittleEndian to get this data back out.

namespace System.IO.Hashing
{
/// <summary>
/// Provides an implementation of the XxHash32 algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide a code sample showing using this and extracting a 32-bit hash, our code sample should demonstrate using BinaryPrimitives.ReadUInt32BigEndian to convert the byte array back to an integer. In particular, devs may be drawn to Convert.ToInt32, which will not provide the correct answer. This code sample doesn't need to be in the devdoc, but I do think it's a potential pit of failure if we don't show an example.

Ref: https://github.com/Cyan4973/xxHash/blob/f9155bd4c57e2270a4ffbb176485e5d713de1c9b/doc/xxhash_spec.md#step-7-output

namespace System.IO.Hashing
{
/// <summary>
/// Provides an implementation of the XxHash64 algorithm.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we provide a code sample showing using this and extracting a 64-bit hash, our code sample should demonstrate using BinaryPrimitives.ReadUInt64BigEndian to convert the byte array back to an integer. In particular, devs may be drawn to Convert.ToInt64, which will not provide the correct answer. This code sample doesn't need to be in the devdoc, but I do think it's a potential pit of failure if we don't show an example.

Ref: https://github.com/Cyan4973/xxHash/blob/f9155bd4c57e2270a4ffbb176485e5d713de1c9b/doc/xxhash_spec.md#step-7-output-1

@GrabYourPitchforks
Copy link
Member

Code LGTM. Approved pending whatever fixups are necessary to make packaging work. :)

<TargetFrameworks>net6.0;netstandard2.0;net461</TargetFrameworks>
<IsPackable>true</IsPackable>
<!-- TODO: Remove when the package ships with .NET 6. -->
<DisablePackageBaselineValidation>true</DisablePackageBaselineValidation>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually the inverse. Sorry...

Suggested change
<DisablePackageBaselineValidation>true</DisablePackageBaselineValidation>
<EnablePackageBaselineValidation>false</EnablePackageBaselineValidation>

@ViktorHofer
Copy link
Member

Looks like you are missing the reference project, in case you want/need it. I don't it's strictly required. cc @ericstj

Copy link
Member

@ViktorHofer ViktorHofer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the ref project question, LGTM. Thanks for addressing the packaging feedback.

@bartonjs
Copy link
Member Author

bartonjs commented Jun 8, 2021

Looks like you are missing the reference project, in case you want/need it. I don't it's strictly required

I copied the structure from System.Formats.Asn1, where I understood it to be "we still generate the .cs file for API evolution/observation purposes, and we still build it to align the src and ref, but we don't need to ship it because we don't do versioned API"... or something like that.

@eerhardt
Copy link
Member

eerhardt commented Jun 16, 2021

@bartonjs - was there a reason why this didn't include a \ref\System.IO.Hashing.csproj file? I don't see a "ref" project file in:

https://github.com/dotnet/runtime/tree/main/src/libraries/System.IO.Hashing/ref

Oh, I see the response above. I don't totally understand it, but I guess you addressed the question above.

UPDATE: @ViktorHofer answered all my questions offline.

@bartonjs bartonjs deleted the non_crypto_hashing branch June 17, 2021 16:20
@ghost ghost locked as resolved and limited conversation to collaborators Jul 17, 2021
@bartonjs bartonjs removed their assignment Jul 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

General purpose non-cryptographic hashing API for .NET
6 participants