Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Vector128/256.Equals via TestZ #55875

Merged
merged 11 commits into from
Oct 4, 2021
Merged

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 17, 2021

This PR optimizes Vector128/256.Equals via the following improvements:

  • Substitutes Vector_.Zero during inlining
  • Optimizes Xor(x, zero) to just x in morph
  • Adds Avx2/Sse41 TestZ path in Vector_.Equals
// Compare two vectors
bool Test128(Vector128<int> v1, Vector128<int> v2) => v1.Equals(v2);
bool Test256(Vector256<int> v1, Vector256<int> v2) => v1.Equals(v2);

// Compare against Zero:
bool Test128(Vector128<int> v1) => v1.Equals(Vector128<int>.Zero);
bool Test256(Vector256<int> v1) => v1.Equals(Vector256<int>.Zero);

Codegen diff: https://www.diffchecker.com/zJhfk9yq
According to benchmarks, it makes it 10-15% faster

jit-diff is quite small:

Total bytes of base: 61261912
Total bytes of diff: 61261864
Total bytes of delta: -48 (-0.00% of base)
Total relative delta: -0.19
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -48 : System.Private.CoreLib.dasm (-0.00% of base)

1 total files with Code Size differences (1 improved, 0 regressed), 272 unchanged.

Top method improvements (bytes):
         -16 (-10.74% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Vector128`1):bool:this (6 methods)
         -16 (-2.48% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Object):bool:this (6 methods)
          -8 (-5.06% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Vector256`1):bool:this (6 methods)
          -8 (-1.18% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Object):bool:this (6 methods)

Top method improvements (percentages):
         -16 (-10.74% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Vector128`1):bool:this (6 methods)
          -8 (-5.06% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Vector256`1):bool:this (6 methods)
         -16 (-2.48% of base) : System.Private.CoreLib.dasm - Vector128`1:Equals(Object):bool:this (6 methods)
          -8 (-1.18% of base) : System.Private.CoreLib.dasm - Vector256`1:Equals(Object):bool:this (6 methods)

Superpmi:

asm.coreclr_tests.pmi.windows.x64.checked

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 4987
Total bytes of diff: 4976
Total bytes of delta: -11 (-0.22% of base)
Total relative delta: -0.01
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -11 : 251366.dasm (-0.76% of base)
1 total files with Code Size differences (1 improved, 0 regressed), 4 unchanged.

Top method improvements (bytes):
         -11 (-0.76% of base) : 251366.dasm - GitHub_18144:Main(System.String[]):int

Top method improvements (percentages):
         -11 (-0.76% of base) : 251366.dasm - GitHub_18144:Main(System.String[]):int

1 total methods with Code Size differences (1 improved, 0 regressed), 4 unchanged.



asm.libraries_tests.pmi.windows.x64.checked

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 1558
Total bytes of diff: 1522
Total bytes of delta: -36 (-2.31% of base)
Total relative delta: -0.20
    diff is an improvement.
    relative diff is an improvement.


Top file improvements (bytes):
         -12 : 309432.dasm (-7.02% of base)
         -12 : 309515.dasm (-6.56% of base)
         -12 : 309609.dasm (-6.56% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 1 unchanged.

Top method improvements (bytes):
         -12 (-7.02% of base) : 309432.dasm - System.Numerics.Tests.Vector2Tests:Vector2ZeroTest():this
         -12 (-6.56% of base) : 309515.dasm - System.Numerics.Tests.Vector3Tests:Vector3ZeroTest():this
         -12 (-6.56% of base) : 309609.dasm - System.Numerics.Tests.Vector4Tests:Vector4ZeroTest():this

Top method improvements (percentages):
         -12 (-7.02% of base) : 309432.dasm - System.Numerics.Tests.Vector2Tests:Vector2ZeroTest():this
         -12 (-6.56% of base) : 309515.dasm - System.Numerics.Tests.Vector3Tests:Vector3ZeroTest():this
         -12 (-6.56% of base) : 309609.dasm - System.Numerics.Tests.Vector4Tests:Vector4ZeroTest():this

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 17, 2021
@MichalPetryka
Copy link
Contributor

Partially fixes #55343 since that also talks about optimizing AllBitsSet to a TestC.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 23, 2021

@tannergooding @dotnet/jit-contrib does it look good?

@EgorBo
Copy link
Member Author

EgorBo commented Sep 7, 2021

@dotnet/jit-contrib PTAL, a small improvement for Vector128/256.Create

Failed CI job is unrelated (broken gcc build)

GenTree* op2 = hw->gtGetOp2();
if (!gtIsActiveCSE_Candidate(tree))
{
if (op1->IsIntegralConstVector(0) && !gtIsActiveCSE_Candidate(op1))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that IsIntegralConstVector will only work for integer vectors, so all floating point vectors will be rejected here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xor is not used in Equals for floats and doubles so it's not a big deal

but it currently handles this (Xor for floats) just fine:

static Vector128<float> Foo(Vector128<float> v) => 
    Sse.Xor(v, Vector128.Create(0).AsSingle());

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for zero float it might be tricky as e.g. -0.0 can't be used in this optimization so is not worth the effort

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it be tricky? You can account for -0.0 by checking the bitwise (rather than floating-point) value is 0.

Speaking of 0.0 vs -0.0, it looks like there might be existing bugs in IsFPZero and IsSIMDZero since they don't take this into account.

@EgorBo EgorBo mentioned this pull request Sep 12, 2021
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. But would like @tannergooding to also approve.

@EgorBo
Copy link
Member Author

EgorBo commented Sep 23, 2021

@tannergooding PTAL

}
#endif

// TODO: Enable substitution for CORINFO_HELP_TYPEHANDLE_TO_RUNTIMETYPE (typeof(T))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing issue for this TODO?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I want to merge it - to start experimenting with it 🙂 the issue is #40381

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I just wanted to make sure we had some github issue corresponding to the TODO, so its not just a comment we'll forget about

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Would be good to ensure floating-point is equally well handled.

@EgorBo EgorBo merged commit fcee44a into dotnet:main Oct 4, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Nov 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants