Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"==0" optimization in Boolean logic #13573 #49548

Merged
merged 25 commits into from
Jul 14, 2021
Merged

Conversation

JulieLeeMSFT
Copy link
Member

@JulieLeeMSFT JulieLeeMSFT commented Mar 12, 2021

Fixes #13573.

It optimizes Integral type Boolean logic with two operands when the first BB is conditional BB (e.g., BBJ_COND) and the second BB is conditional return (e.g., GT_RETURN > BBJ_RETURN).
For example, it changes below logic:

  • x==0 && y==0 to (x|y)==0

In case the folding involves ANDing or comparison to GT_NE, it gets optimized only if the tree is evaluated as BOOLEAN.

  • x!=0 && y!=0 to (x&y) !=0
  • x==0 || y==0 to (x&y)==0
  • x==1 || y==1 to (x|y) !=0

Edit: Note: However, above 3 cases are not optimized because the operand of tree is not evaluated as boolean expression in the current JIT.

This PR does not optimze cases where it requries NOT transformation of x or y because the cost of NOT operation is large.

  • (x | !y) != 0
  • (!x | y) != 0
public static bool AreZero(int x, int y) {
    return x == 0 && y == 0;
}

3 Basic Blocks are folded as below:
Before:

*** marking local variables in block BB01 (weight=1   )
STMT00000 (IL 0x000...0x001)
               [000003] -----+------              *  JTRUE     void  
               [000002] J----+-N----              \--*  NE        int   
               [000000] -----+------                 +--*  LCL_VAR   int    V00 arg0         
               [000001] -----+------                 \--*  CNS_INT   int    0
New refCnts for V00: refCnt =  1, refCntWtd = 1   

*** marking local variables in block BB02 (weight=0.50)
STMT00002 (IL 0x003...0x007)
               [000009] -----+------              *  RETURN    int   
               [000008] -----+------              \--*  EQ        int   
               [000006] -----+------                 +--*  LCL_VAR   int    V01 arg1         
               [000007] -----+------                 \--*  CNS_INT   int    0
New refCnts for V01: refCnt =  1, refCntWtd = 0.50

*** marking local variables in block BB03 (weight=0.50)
STMT00001 (IL 0x008...0x009)
               [000005] -----+------              *  RETURN    int   
               [000004] -----+------              \--*  CNS_INT   int    0

After:

Folded boolean conditions of BB01, BB02 and BB03 to :
STMT00000 (IL 0x000...0x001)
               [000003] -----+------              *  RETURN    int
               [000002] J----+-N----              \--*  EQ        int   
               [000011] ------------                 +--*  OR        int   
               [000000] -----+------                 |  +--*  LCL_VAR   int    V00 arg0         
     (  3,  2) [000006] ------------                 |  \--*  LCL_VAR   int    V01 arg1         
               [000001] -----+------                 \--*  CNS_INT   int    0

Diff:

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 531
Total bytes of diff: 483
Total bytes of delta: -48 (-9.04% of base)
    diff is an improvement.


Top file improvements (bytes):
         -16 : 113888.dasm (-7.14% of base)
         -16 : 159193.dasm (-11.35% of base)
         -16 : 109061.dasm (-9.64% of base)

3 total files with Code Size differences (3 improved, 0 regressed), 0 unchanged.

Top method improvements (bytes):
         -16 (-7.14% of base) : 113888.dasm - Microsoft.Extensions.Primitives.StringSegment:Equals(Microsoft.Extensions.Primitives.StringSegment,int):bool:this
         -16 (-11.35% of base) : 159193.dasm - System.Xml.Linq.XElement:AttributesEqual(System.Xml.Linq.XElement):bool:this
         -16 (-9.64% of base) : 109061.dasm - Microsoft.CSharp.RuntimeBinder.RuntimeBinderExtensions:IsEquivalentTo(System.Reflection.ParameterInfo,System.Reflection.ParameterInfo,System.Reflection.MethodBase,System.Reflection.MethodBase):bool

Top method improvements (percentages):
         -16 (-11.35% of base) : 159193.dasm - System.Xml.Linq.XElement:AttributesEqual(System.Xml.Linq.XElement):bool:this
         -16 (-9.64% of base) : 109061.dasm - Microsoft.CSharp.RuntimeBinder.RuntimeBinderExtensions:IsEquivalentTo(System.Reflection.ParameterInfo,System.Reflection.ParameterInfo,System.Reflection.MethodBase,System.Reflection.MethodBase):bool
         -16 (-7.14% of base) : 113888.dasm - Microsoft.Extensions.Primitives.StringSegment:Equals(Microsoft.Extensions.Primitives.StringSegment,int):bool:this

3 total methods with Code Size differences (3 improved, 0 regressed), 0 unchanged.

@JulieLeeMSFT JulieLeeMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 12, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Mar 12, 2021
@JulieLeeMSFT JulieLeeMSFT self-assigned this Mar 12, 2021
@danmoseley
Copy link
Member

danmoseley commented Mar 12, 2021

Just curious, where is the hit in XElement.AttributesEqual? Is it somehow collapsing the distinct checks of name and value (both references)?

private bool AttributesEqual(XElement e)
{
XAttribute? a1 = lastAttr;
XAttribute? a2 = e.lastAttr;
if (a1 != null && a2 != null)
{
do
{
a1 = a1.next!;
a2 = a2.next!;
if (a1.name != a2.name || a1.value != a2.value) return false;
} while (a1 != lastAttr);
return a2 == e.lastAttr;
}
return a1 == null && a2 == null;
}

@pentp
Copy link
Contributor

pentp commented Mar 16, 2021

Transforming x!=0 || y!=0 to (x&y) !=0 is not correct. Did you mean (x|y) !=0?

I wonder if this optimization could actually cause regressions in already hand-optimized code by introducing additional register pressure and memory accesses? For example transforming some.x==0 && other.y==0 to (some.x | other.y)==0 requires an additional temporary register and unconditionally accesses another memory location that might not be in cache - this would be a regression if some.x==0 is usually true. It's not possible for the JIT to know if this is the case because this could depend on input data.

@JulieLeeMSFT
Copy link
Member Author

Just curious, where is the hit in XElement.AttributesEqual? Is it somehow collapsing the distinct checks of name and value (both references)?

Working on other failing case. Will look into this later.
There is a Return statement pattern that cannot be folded like BB03 below.

***** BB01
STMT00000 (IL 0x000...0x001)
               [000003] -----+------              *  JTRUE     void  
               [000002] J----+-N----              \--*  NE        int   
               [000000] -----+------                 +--*  LCL_VAR   ref    V00 arg0         
               [000001] -----+------                 \--*  CNS_INT   ref    null

------------ BB02 [003..008) (return), preds={BB01} succs={}

***** BB02
STMT00002 (IL 0x003...0x007)
               [000012] -----+------              *  RETURN    int   
               [000011] -----+------              \--*  EQ        int   
               [000009] -----+------                 +--*  LCL_VAR   ref    V01 arg1         
               [000010] -----+------                 \--*  CNS_INT   ref    null

------------ BB03 [008..010) (return), preds={BB01} succs={}

***** BB03
STMT00001 (IL 0x008...0x00F)
               [000008] --CXG+------              *  RETURN    int   
               [000007] --CXG+------              \--*  CAST      int <- bool <- int
               [000006] --CXG+------                 \--*  CALL r2r_ind int    hackishModuleName.hackishMethodName
               [000004] -----+------ this in rcx        +--*  LCL_VAR   ref    V00 arg0         
               [000005] -----+------ arg1 in rdx        \--*  LCL_VAR   ref    V01 arg1  

@JulieLeeMSFT
Copy link
Member Author

Transforming x!=0 || y!=0 to (x&y) !=0 is not correct. Did you mean (x|y) !=0?

Thanks for the good catch. I meant x!=0 && y!=0 to (x&y) !=0. Updated.

I wonder if this optimization could actually cause regressions in already hand-optimized code by introducing additional register pressure and memory accesses? For example transforming some.x==0 && other.y==0 to (some.x | other.y)==0 requires an additional temporary register and unconditionally accesses another memory location that might not be in cache - this would be a regression if some.x==0 is usually true. It's not possible for the JIT to know if this is the case because this could depend on input data.

We did this in order to reduce branching cost.
cc @AndyAyersMS what he thinks.

@JulieLeeMSFT
Copy link
Member Author

Just curious, where is the hit in XElement.AttributesEqual? Is it somehow collapsing the distinct checks of name and value (both references)?

@danmoseley It is the last statement that is optimized.

return a1 == null && a2 == null;

Before optimization:

*** marking local variables in block BB08 (weight=0.50)
STMT00003 (IL 0x05D...0x05E)
               [000015] -----+------              *  JTRUE     void  
               [000014] J----+-N----              \--*  NE        int   
               [000012] -----+------                 +--*  LCL_VAR   ref    V02 loc0         
               [000013] -----+------                 \--*  CNS_INT   ref    null
New refCnts for V02: refCnt =  8, refCntWtd = 22.50

*** marking local variables in block BB09 (weight=0.50)
STMT00005 (IL 0x060...0x064)
               [000021] -----+------              *  RETURN    int   
               [000020] -----+------              \--*  EQ        int   
               [000018] -----+------                 +--*  LCL_VAR   ref    V03 loc1         
               [000019] -----+------                 \--*  CNS_INT   ref    null
New refCnts for V03: refCnt =  8, refCntWtd = 18.50

*** marking local variables in block BB10 (weight=0.50)
STMT00004 (IL 0x065...0x066)
               [000017] -----+------              *  RETURN    int   
               [000016] -----+------              \--*  CNS_INT   int    0

After optimization:

*************** In optOptimizeBools()
Folded boolean conditions of BB08, BB09 and BB10 to :
STMT00003 (IL 0x05D...0x05E)
               [000015] -----+------              *  RETURN    int   
               [000014] J----+-N----              \--*  EQ        int   
               [000101] ------------                 +--*  OR        long  
               [000012] -----+------                 |  +--*  LCL_VAR   ref    V02 loc0         
     (  1,  1) [000018] ------------                 |  \--*  LCL_VAR   ref    V03 loc1         
               [000013] -----+------                 \--*  CNS_INT   long   0

@JulieLeeMSFT
Copy link
Member Author

@AndyAyersMS all pri 0 tests passed. Please review this PR.
CC @dotnet/jit-contrib

@JulieLeeMSFT JulieLeeMSFT changed the title "==0" optimization in Boolean logic #13573 [Draft] "==0" optimization in Boolean logic #13573 Mar 17, 2021
@pentp
Copy link
Contributor

pentp commented Mar 17, 2021

Transforming x!=0 && y!=0 to (x&y) !=0 is also not correct (e.g., when x=1, y=2).

@BruceForstall
Copy link
Member

Are there test cases in the CoreCLR pri-1 tests that hit all the cases you intend to optimize? If not, can you add a unit test (or multiple) that diffs will show this affecting?

@JulieLeeMSFT
Copy link
Member Author

JulieLeeMSFT commented Mar 17, 2021

is also not correct (e.g., when x=1, y=2).

It is boolean optimization, so the values that are compared to is either 0 or 1.
Edit: This will not be optimized because the code will skip boolean optimization when it involves NOT operation.

@JulieLeeMSFT
Copy link
Member Author

/azp list

@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@JulieLeeMSFT JulieLeeMSFT marked this pull request as draft March 17, 2021 22:23
@JulieLeeMSFT
Copy link
Member Author

/azp runtime-coreclr jitstress

@azure-pipelines

This comment has been minimized.

@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some notes....

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few notes.

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr outerloop

@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@benaadams
Copy link
Member

CI is complaining about the C/C++ formatting

@runfoapp runfoapp bot mentioned this pull request Mar 22, 2021
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like a lot of code change for very few diffs. Why are there so few diffs?

Levi suggested handling cases like (a > 0) && (b > 0). Future work?

The optimization doesn't handle cases like bool b = a == 0 && b == 0 && c == 0 because the last block is a GT_ASG instead of GT_RETURN (for example). Future work?

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments.

Also, at some point, and after all changes are approved and final, make sure to run outerloop and jitstress pipelines.

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT
Copy link
Member Author

It seems like a lot of code change for very few diffs. Why are there so few diffs?

Is it because I had many commits?

Levi suggested handling cases like (a > 0) && (b > 0). Future work?

He suggested a < 0 || b < 0. It was not handled in this PR. I was planning to create another issue for that.

The optimization doesn't handle cases like bool b = a == 0 && b == 0 && c == 0 because the last block is a GT_ASG instead of GT_RETURN (for example). Future work?

This case was not handled. Will inlcude this case in the new issue together with a < 0 || b < 0.

@JulieLeeMSFT
Copy link
Member Author

@BruceForstall @dotnet/jit-contrib I addressed BruceForstall's code review feedback. PTAL.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think this is getting close.

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved
@JulieLeeMSFT
Copy link
Member Author

@AndyAyersMS @BruceForstall PTAL.

@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr outerloop

@JulieLeeMSFT
Copy link
Member Author

/azp run runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

1 similar comment
@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple comments, but nothing blocking.

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
@JulieLeeMSFT
Copy link
Member Author

I left a couple comments, but nothing blocking.

@BruceForstall @dotnet/jit-contrib PTAL. I incorporated Bruce's code review feedback.

@JulieLeeMSFT
Copy link
Member Author

Test failure on OSX arm64 is not related to my change. It is infra issue. Merging the code.

@JulieLeeMSFT JulieLeeMSFT merged commit 8a34e76 into dotnet:main Jul 14, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT could further optimize "== 0" checks in Boolean logic
7 participants