-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vectorized implementation of hex encoding/decoding #39702
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did just a quick pass over the code (will have a deeper look a bit later).
Do you plan to provide a SSE2 / SSE41 (for blend) path too?
I think it should be possible to do so...
When the PR isn't a draft anymore, we should collect numbers for differnt input-size to verify that small inputs (common case?) don't regress (too much). |
Thanks for the initial review. I'll make your suggested adjustments.
I have the following algorithms ready but decided to start here with AVX2 (biggest bang for the buck). Encode:
Decode:
Feel free to browse through my HexMate repository to check out those implementations as well.
Absolutely! I'll make the effort of creating those numbers, once I've seen some interest in actually adding a vectorized implementation from the team. |
|
||
private static ReadOnlySpan<byte> LowerHexLookupTable => new byte[] | ||
{ | ||
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
** not relevant yet -- only when SSE2 comes into play **
Had a quick look at HexMate, and saw that (as expected) the constants for AVX2 just doubles SSE2 in the upper lane.
So in order to save some (disk) space for the assembly, one could use this outline:
Vector128<byte> sseConstant = ReadVector(Sse2ContantData);
Vector256<byte> avxConstant = Vector256.Create(sseConstant, sseConstant);
In essence it's about avoiding to have duplicated data in the "text-section" of the DLL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I added your suggestion to the HexMate repo (here) 👍
I let this open as a reminder in case we get to SSE2 in this PR.
Tagging @bartonjs, @stephentoub and @GrabYourPitchforks to get a sense of the teams interest in this as you all contributed to the original issue (#17837) and/or the PR that recently added the public API (#37546). |
The build errors are because some projects using HexConverter.cs do not reference S.R.Intrinsics and S.N.Vectors. @Anipik from a layering POV is it OK for those other projects to have references to those? |
We should be fine with using S.R.Intrinsics. |
Going through old inbox notifications and came across this. Does #44111 render this moot? |
Now that #44111 is in we should consider how to morph this PR to provide additional benefit. Since this PR focuses on AVX/AVX2, does it provide a significant performance boost over the SSSE3 code paths for the common cases of processing 16/32/64 bytes? |
@jeffhandley I‘ll look into it 👍 |
I've updated my encoding method locally and ran some benchmarks for 32 and 64 byte inputs. (The algorithm I'm proposing works with input buffers of >= 32 bytes for encoding and output buffers of >= 32 bytes for decoding hex characters) It looks like we could achieve a > 2x improvement using AVX2.
@GrabYourPitchforks, @jeffhandley Let me know if you think the speedup justifies maintaining yet another code path for encoding/decoding hex chars. If you think so, I'll gladly put in the work to update the PR accordingly. |
"Converting" the current encoding algorithm to AVX2 allows us to encode input buffers of >= 8 bytes but at a slower speed:
|
It is a shame that the internal class One real-world example of where lower case must be used is: I know it's not directly related to this optimization, but hopeful that the optimization will handle lowercase and expose it. |
@rcollette as far as I can tell, these perf changes do not break the contract of HexConverter.ToString that accepts the desired casing. If you believe that |
Thanks for the investigation! TBH the thing I'm struggling with is that this seems like a lot of code (+ extra review, maintenance, unit tests, etc.) for something that's on the expected high end going to save us around 10 ns per invocation. We don't tend to have frequent use of AVX2 intrinsics from within CoreLib, with the notable exceptions of two really hot code families: (a) But let's try for some extra perspectives here. @tannergooding @EgorBo, since you both have experience in this code, do you think there are scenarios that could benefit from this being AVX2-enlightened? Please feel free to tell me that I'm being too grumpy and that some apps would see clear benefits from this. :) I'm operating under the assumption that most apps which perform hex processing are doing so with fixed-length payloads, generally no more than 64 bytes (512 bits), and only occasionally with higher lengths. I'm also assuming that any workload which involves hex conversion is dominated by whatever processing takes place over the raw decoded binary buffer. |
AVX2 (or more specifically |
@EgorBo thoughts about @GrabYourPitchforks question above? |
@GrabYourPitchforks This PR is assigned to you for follow-up/decision before the RC1 snap, but I've also assigned @EgorBo since we're awaiting their feedback. |
I'm updating this PR to target the 7.0.0 milestone as we were not able to finish the review and validation of this change before .NET 6.0 RC1 snapped. I will resolve the conflicts and push to the branch so that we can review as soon as time permits. |
I was finally able to take another look at this PR, the comment history, and another review of the code. I've decided, largely based on the comments from @GrabYourPitchforks (here) and @tannergooding (here), that we will close this PR without merging it, leaving the changes from #44111 in place. @tkp1n Thanks for your effort on this and your patience with this PR being open as long as it was before we came to this conclusion. |
History
#37546 introduced a public API to encode/decode bytes from/to hexadecimal strings. It also consolidated a lot of internal implementations of such methods to the common class
HexConverter
.The implementation in the PR mentioned above was kept simple to ensure the API gets merged in time for .NET 5 with optimizations left for further PRs.
Summary
This PR now introduces vectorized methods for the encoding/decoding of hex data as a follow-up to the above-mentioned PR.
With relatively little added complexity we achieve more than an order of magnitude in performance improvements for larger data sets (e.g. 2048 bytes).
Perf-Numbers
Review Input / Open Issues
@tannergooding and @gfoidl you may be interested
This is marked as a draft PR for several reasons:
#if
) the vectorized methods from builds not supporting intrinsics. I didn't find any code based on intrinsics in theCommon
directory to base this kind of build restrictions on.