Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GetIndexOfFirstNonAsciiByte_Vector not taken on ARM64 #90527

Closed
wants to merge 4 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,7 @@ internal static unsafe nuint GetIndexOfFirstNonAsciiByte(byte* pBuffer, nuint bu
// like pmovmskb which we know are optimized, and (b) we can avoid downclocking the processor while
// this method is running.

if (!Vector512.IsHardwareAccelerated &&
!Vector256.IsHardwareAccelerated &&
(Sse2.IsSupported || AdvSimd.IsSupported))
if (!Vector512.IsHardwareAccelerated && !Vector256.IsHardwareAccelerated && Sse2.IsSupported)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.

But I'd also explicitly asked about this:
#88532 (comment)
and was told the preferring of the intrinsics path was by design. Maybe it was only measured on x64 and not on Arm?

cc: @tannergooding, @anthonycanino

Copy link
Contributor Author

@neon-sunset neon-sunset Aug 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do this, then doesn't this change leave a bunch of dead code in GetIndexOfFirstNonAsciiByte_Intrinsified that should be cleaned up? It has multiple blocks dedicated to AdvSimd which would never be reached. At which point the method should also be renamed to be SSE-specific.

Let me fix that.

#88532 (comment)

The _Vector path is estimated to be ~40% faster on osx-arm64, It is likely the switch was not measured on ARM64.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _Vector path is estimated to be ~40% faster on osx-arm64, It is likely the switch was not measured on ARM64.

@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91

The PR that introduced the _Vector path (#88532) didn't change _Intrinsified at all (outside using the centralized Vector128.Size constant) and didn't change which Arm64 was calling.

This then wouldn't seem like a regression, but rather simply a case where the new _Vector code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7 vs .NET 8 difference (without this PR) and the difference for _Vector vs _Intrinsified for .NET 8 (this PR vs without this PR)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neon-sunset can you clarify please? .NET 7 was using the _Intrinsified path for Arm64: https://github.com/dotnet/runtime/blob/v7.0.10/src/libraries/System.Private.CoreLib/src/System/Text/ASCIIUtility.cs#L81-L91

You are correct, my mistake.

This then wouldn't seem like a regression, but rather simply a case where the new _Vector code is faster for Arm64. If that's the case, could you share your own numbers showing the .NET 7 vs .NET 8 difference (without this PR) and the difference for _Vector vs _Intrinsified for .NET 8 (this PR vs without this PR)?

This is the plan, I'm just waiting for the Release build to finish to post detailed numbers 😄

{
return GetIndexOfFirstNonAsciiByte_Intrinsified(pBuffer, bufferLength);
}
Expand Down
Loading