Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT further Boolean logic optimization #55605

Closed
JulieLeeMSFT opened this issue Jul 13, 2021 · 4 comments
Closed

JIT further Boolean logic optimization #55605

JulieLeeMSFT opened this issue Jul 13, 2021 · 4 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors Priority:3 Work that is nice to have
Milestone

Comments

@JulieLeeMSFT
Copy link
Member

This issue is to capture remaining work for #13573.

#49548 handled '==0' Boolean expression with two operands:

public static bool AreZero(int x, int y) {
    return x == 0 && y == 0;
}
  • Other Boolean expressions that can be looked into include IsEitherNegative:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsEitherNegative(int a, int b)
{
    return a < 0 || b < 0;
}

Current codegen:

    L0000: test ecx, ecx
    L0002: jl L000d
    L0004: test edx, edx
    L0006: setl al
    L0009: movzx eax, al
    L000c: ret
    L000d: mov eax, 0x1
    L0012: ret

Possible optimal codegen below, using the fact that the or instruction sets SF:

or ecx, edx
sets al   ; or setl, if you prefer, since "or" is known to clear OF
movzx eax, al
ret
  • Handle cases like bool b = a == 0 && b == 0 && c == 0 because the last block is a GT_ASG instead of GT_RETURN (for example).

  • Investigate why the operand of tree is not evaluated as boolean expression which results in no optimization for 3 cases below. Related method: optIsBoolComp().

- x!=0 && y!=0 to (x&y) !=0
- x==0 || y==0 to (x&y)==0
- x==1 || y==1 to (x|y) !=0
  • Paralleize batch of 4+ expressions, e.g.,
    x == 0 && y == 0 && z == 0 && w == 0 => (x | y) | (z | w) == 0

  • Take profile data into account**. If B1's branch is biased away from executing B2 then we might want to skip this optimization or lower the amount we're willing to pay. If B1 is strongly biased towards B2 then we might want to perform this operation even if c2 is more costly (since we're quite likely going to evaluate c2 whether we do this optimization or not).

  • Check what kind of diffs there are if we eliminate this cost check. That is, is 12 the right number, and why?

    comp->gtPrepareCost(c2);

    if (c2->GetCostEx() > 12)
  • Consider optimizing the case we have more than the max number of returns, and create a single return block.

  • To check if comp->fgUpdateLoopsAfterCompacting(b1, b3); updates the loop table.

category:cq
theme:basic-cq
skill-level:beginner
cost:small

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jul 13, 2021
@JulieLeeMSFT JulieLeeMSFT added Priority:3 Work that is nice to have and removed untriaged New issue has not been triaged by the area owner labels Jul 13, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Jul 13, 2021
@JulieLeeMSFT JulieLeeMSFT added the help wanted [up-for-grabs] Good issue for external contributors label Jul 13, 2021
@mano-si
Copy link

mano-si commented Jul 14, 2021

@JulieLeeMSFT I can have a look at it. Can you please direct me to any documentation for repreading ?

@JulieLeeMSFT
Copy link
Member Author

Thanks @mano-si. I will assign this to you.

I recommend reading from Compiler::optOptimizeBools() in optimizer.cpp and read header comments of methods that are called from it.

Attached jitdump output file optboolsreturn.txt for the test case src\tests\JIT\opt\OptimizeBools\optboolsreturn.cs from ##49548 for your reference.

  • Search for START compiling for each test case and Starting PHASE Optimize bools to see how it folds.

optboolsreturn.txt

@BruceForstall
Copy link
Member

cc @TIHan

@TIHan TIHan modified the milestones: Future, 8.0.0 Jan 17, 2023
@TIHan TIHan self-assigned this Jan 17, 2023
@TIHan
Copy link
Contributor

TIHan commented Jan 21, 2023

Closing this issue as a lot of the cases have been resolved already. Below are the details.

  1. This example is already optimized:
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static bool IsEitherNegative(int a, int b)
{
    return a < 0 || b < 0;
}

JIT output:

G_M27773_IG02:              ;; offset=0000H
       0BD1                 or       edx, ecx
       8BC2                 mov      eax, edx
       C1E81F               shr      eax, 31
						;; size=7 bbWeight=1 PerfScore 1.00

G_M27773_IG03:              ;; offset=0007H
       C3                   ret     
  1. Cases that were mentioned

x!=0 && y!=0 to (x&y) !=0
x==0 || y==0 to (x&y)==0
x==1 || y==1 to (x|y) !=0

Those three cases are simply not safe to do if x or y are values other than zero or one.

Paralleize batch of 4+ expressions, e.g.,
x == 0 && y == 0 && z == 0 && w == 0 => (x | y) | (z | w) == 0

It's already optimized:

G_M29509_IG02:              ;; offset=0000H
       0BD1                 or       edx, ecx
       410BD0               or       edx, r8d
       410BD1               or       edx, r9d
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
						;; size=14 bbWeight=1 PerfScore 2.00

G_M29509_IG03:              ;; offset=000EH
       C3                   ret      
  1. This example b = a == 0 && b == 0 && c == 0 is interesting because b is an int. I think this is the example we want:
    public static void TestBoolAsg(int a, int b, int c, ref bool z)
    {
        z = a == 0 && b == 0 && c == 0;
    }

When comparing the JIT output to this:

    public static bool TestBoolReturn(int a, int b, int c)
    {
        return a == 0 && b == 0 && c == 0;
    }

TestBoolAsg is as not as optimized as TestBoolReturn:

// TestBoolAsg
G_M56616_IG02:              ;; offset=0000H
       0BD1                 or       edx, ecx
       750A                 jne      SHORT G_M56616_IG04
						;; size=4 bbWeight=1 PerfScore 1.25

G_M56616_IG03:              ;; offset=0004H
       33C0                 xor      eax, eax
       4585C0               test     r8d, r8d
       0F94C0               sete     al
       EB02                 jmp      SHORT G_M56616_IG05
						;; size=10 bbWeight=0.50 PerfScore 1.75

G_M56616_IG04:              ;; offset=000EH
       33C0                 xor      eax, eax
						;; size=2 bbWeight=0.50 PerfScore 0.12

G_M56616_IG05:              ;; offset=0010H
       418801               mov      byte  ptr [r9], al
						;; size=3 bbWeight=1 PerfScore 1.00

G_M56616_IG06:              ;; offset=0013H
       C3                   ret   

vs.

// TestBoolReturn
G_M58330_IG02:              ;; offset=0000H
       0BD1                 or       edx, ecx
       410BD0               or       edx, r8d
       0F94C0               sete     al
       0FB6C0               movzx    rax, al
						;; size=11 bbWeight=1 PerfScore 1.75

G_M58330_IG03:              ;; offset=000BH
       C3                   ret 

Will make an issue regarding this specific case.

Take profile data into account**. If B1's branch is biased away from executing B2 then we might want to skip this optimization or lower the amount we're willing to pay. If B1 is strongly biased towards B2 then we might want to perform this operation even if c2 is more costly (since we're quite likely going to evaluate c2 whether we do this optimization or not).

Check what kind of diffs there are if we eliminate this cost check. That is, is 12 the right number, and why?

comp->gtPrepareCost(c2);

if (c2->GetCostEx() > 12)
Consider optimizing the case we have more than the max number of returns, and create a single return block.

To check if comp->fgUpdateLoopsAfterCompacting(b1, b3); updates the loop table.

Because the current boolean optimizations are not extremely impactful, I think we could spend more time on other areas as messing with the cost values are not trivial to get right.

@TIHan TIHan closed this as completed Jan 21, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Feb 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI help wanted [up-for-grabs] Good issue for external contributors Priority:3 Work that is nice to have
Projects
None yet
Development

No branches or pull requests

4 participants