Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to loop hoisting #71504

Merged
merged 2 commits into from
Jul 1, 2022

Conversation

BruceForstall
Copy link
Member

  • optHoistLoopNest(): (1) no need to check if a pre-header has already been added;
    pre-headers are unique to a loop if they exist. (2) simplify the "append to list" code.
  • optHoistThisLoop(): (1) allow adding multiple pre-headers after every "IDom" addition,
    if there are any in the right order. (2) Simplify addition of remaining pre-headers.
  • optLoopEntry(): technically, you shouldn't look at bbJumpDest
    unless you know the branch kind is one that sets it. So change the condition.

Also, fix the output of why something wasn't hoisted. The concept of "hoistable" implies
"invariant" (but not the reverse). So print the least specific failure reason. These were
backwards before. For instance, if we couldn't hoist a top node, but it was invariant, we
would, oddly, print "not invariant" instead of "not hoistable". In the case where it isn't
hoistable, and also isn't invariant, it makes more sense to print "not invariant", which
also implies "not hoistable".

Add dumping the OpName for the Pre/PostOrderVisit output, to help avoid cross-referencing
the node number all the time.

No asm diffs.

- optHoistLoopNest(): (1) no need to check if a pre-header has already been added;
pre-headers are unique to a loop if they exist. (2) simplify the "append to list" code.
- optHoistThisLoop(): (1) allow adding multiple pre-headers after every "IDom" addition,
if there are any in the right order. (2) Simplify addition of remaining pre-headers.
- optLoopEntry(): technically, you shouldn't look at `bbJumpDest`
unless you know the branch kind is one that sets it. So change the condition.

Also, fix the output of why something wasn't hoisted. The concept of "hoistable" implies
"invariant" (but not the reverse). So print the least specific failure reason. These were
backwards before. For instance, if we couldn't hoist a top node, but it was invariant, we
would, oddly, print "not invariant" instead of "not hoistable". In the case where it isn't
hoistable, and also isn't invariant, it makes more sense to print "not invariant", which
also implies "not hoistable".

Add dumping the OpName for the Pre/PostOrderVisit output, to help avoid cross-referencing
the node number all the time.

No asm diffs.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 30, 2022
@BruceForstall
Copy link
Member Author

While the change (1) to optHoistThisLoop() to allow adding multiple pre-headers after every "IDom" addition doesn't lead to any spmi asm diffs, it does lead to a different "Considering hoisting in blocks that either dominate exit block BB09 or preheaders of nested loops, if any" order in the following test case:

using System;
using System.Runtime.CompilerServices;
namespace ns
{
    class Program
    {
        public static int x, y;

        [MethodImpl(MethodImplOptions.NoInlining)]
        public static int foo(int i)
        {
            int result = 0;
            for (int j = 0; j < 100; j++)
            {
                result += (x * y);

                if (i > j)
                {
                    result += (x * y);
                }

                result += (x * y);

                if (i == j)
                {
                    for (int k = 0; k < 100; k++)
                    {
                        result += (x * y);
                    }

                    for (int k = 0; k < 100; k++)
                    {
                        result += (x * y);
                    }
                }
            }
            return result;
        }

	    public static void Main()
        {
            x = 1;
            y = 2;
            Console.WriteLine("{0}", foo(7));
        }
    }
}

@BruceForstall
Copy link
Member Author

@kunalspathak @dotnet/jit-contrib PTAL

@ghost ghost assigned BruceForstall Jun 30, 2022
@ghost
Copy link

ghost commented Jun 30, 2022

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details
  • optHoistLoopNest(): (1) no need to check if a pre-header has already been added;
    pre-headers are unique to a loop if they exist. (2) simplify the "append to list" code.
  • optHoistThisLoop(): (1) allow adding multiple pre-headers after every "IDom" addition,
    if there are any in the right order. (2) Simplify addition of remaining pre-headers.
  • optLoopEntry(): technically, you shouldn't look at bbJumpDest
    unless you know the branch kind is one that sets it. So change the condition.

Also, fix the output of why something wasn't hoisted. The concept of "hoistable" implies
"invariant" (but not the reverse). So print the least specific failure reason. These were
backwards before. For instance, if we couldn't hoist a top node, but it was invariant, we
would, oddly, print "not invariant" instead of "not hoistable". In the case where it isn't
hoistable, and also isn't invariant, it makes more sense to print "not invariant", which
also implies "not hoistable".

Add dumping the OpName for the Pre/PostOrderVisit output, to help avoid cross-referencing
the node number all the time.

No asm diffs.

Author: BruceForstall
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Member Author

No diffs

@kunalspathak
Copy link
Member

it does lead to a different order

Do you mind sharing before vs. after Jitdump?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the cleanup. Add few suggestions.

src/coreclr/jit/optimizer.cpp Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
src/coreclr/jit/optimizer.cpp Show resolved Hide resolved
while (cur != nullptr && pLoopDsc->lpContains(cur) && cur != pLoopDsc->lpEntry)
{
JITDUMP(" -- " FMT_BB " (dominate exit block)\n", cur->bbNum);
defExec.Push(cur);
cur = cur->bbIDom;

if (preHeadersList != nullptr)
for (; preHeadersList != nullptr; preHeadersList = preHeadersList->next)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find out an example where we didn't add a preheader in this loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you find out an example where we didn't add a preheader in this loop?

Do you mean where we added more than one preheader? Zero or one would happen the same as before. (And the answer is yes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok and the example is #71504 (comment)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Jun 30, 2022
@BruceForstall
Copy link
Member Author

Do you mind sharing before vs. after Jitdump?

Here's the relevant fragment for the test case I included above:

Base:
--  BB09 (dominate exit block)
--  BB11 (preheader of L02)
--  BB04 (dominate exit block)
--  BB12 (preheader of L01)
--  BB02 (entry block)

Diff:
--  BB09 (dominate exit block)
--  BB11 (preheader of L02)
--  BB12 (preheader of L01)
--  BB04 (dominate exit block)
--  BB02 (entry block)

Note how BB04 and BB12 are swapped. In the "diff" case, we loop and add both BB11/BB12 after BB09.

@kunalspathak
Copy link
Member

Note how BB04 and BB12 are swapped.

Can you paste the Block table just before we do the hoisting?

@BruceForstall
Copy link
Member Author

Can you paste the Block table just before we do the hoisting?

-----------------------------------------------------------------------------------------------------------------------------------------
BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..006)-> BB10 ( cond )                     i
BB02 [0001]  2       BB01,BB09             4     0 [006..018)-> BB04 ( cond )                     i Loop Loop0 bwd bwd-target
BB03 [0002]  1       BB02                  2     0 [018..026)                                     i bwd
BB04 [0003]  2       BB02,BB03             4     0 [026..038)-> BB09 ( cond )                     i bwd
BB05 [0004]  1       BB04                  2     0 [038..03C)-> BB07 ( cond )                     i bwd
BB12 [0017]  1       BB05                  2     0 [03C..???)                                     internal LoopPH
BB06 [0005]  2       BB06,BB12            16     1 [03C..053)-> BB06 ( cond )                     i Loop Loop0 bwd bwd-target align
BB07 [0007]  2       BB05,BB06             2     0 [053..057)-> BB09 ( cond )                     i bwd
BB11 [0016]  1       BB07                  2     0 [057..???)                                     internal LoopPH
BB08 [0008]  2       BB08,BB11            16     2 [057..06E)-> BB08 ( cond )                     i Loop Loop0 bwd bwd-target align
BB09 [0010]  3       BB04,BB07,BB08        4     0 [06E..077)-> BB02 ( cond )                     i bwd
BB10 [0012]  2       BB01,BB09             1       [077..079)        (return)                     i
-----------------------------------------------------------------------------------------------------------------------------------------

***************  Natural loop table
L00, from BB02 to BB09 (Head=BB01, Entry=BB02, Exit=BB09), child loop = L02
L01, from BB06 to BB06 (Head=BB12, Entry=BB06, Exit=BB06, parent=L00) prehead
L02, from BB08 to BB08 (Head=BB11, Entry=BB08, Exit=BB08, parent=L00), sibling loop = L01 prehead

@kunalspathak
Copy link
Member

Thanks. It is interesting to see that despite the reordering, we don't see any spmi-diff. How many methods have such cases?

@BruceForstall
Copy link
Member Author

It is interesting to see that despite the reordering, we don't see any spmi-diff. How many methods have such cases?

Here are the 15 functions in SPMI collections that have 2 or more pre-headers to be added in the inner loop (that is, the reason why the inner loop should be a while instead of an if). Presumably even though they have blocks added in a different order, nothing got subsequently hoisted in a different order (or at all).

Benchstone.BenchF.Romber:Bench():bool
BitOps:DoBitfieldIteration(System.Int32[],System.Int32[],int,byref):long
DefaultBinder:BindToMethod(int,ref,byref,ref,CultureInfo,ref,byref):MethodBase:this
Microsoft.CodeAnalysis.CSharp.Symbols.ConstantEvaluationHelpers:OrderGraph(System.Collections.Generic.Dictionary`2[Microsoft.CodeAnalysis.CSharp.Symbols.SourceFieldSymbolWithSyntaxReference, Microsoft.CodeAnalysis.CSharp.Symbols.ConstantEvaluationHelpers+Node`1[Microsoft.CodeAnalysis.CSharp.Symbols.SourceFieldSymbolWithSyntaxReference]],Microsoft.CodeAnalysis.ArrayBuilder`1[Microsoft.CodeAnalysis.CSharp.Symbols.ConstantEvaluationHelpers+FieldInfo])
Microsoft.CodeAnalysis.CSharp.Symbols.OverriddenOrHiddenMembersHelpers:MakeInterfaceOverriddenOrHiddenMembers(Microsoft.CodeAnalysis.CSharp.Symbol,bool):Microsoft.CodeAnalysis.CSharp.Symbols.OverriddenOrHiddenMembersResult
Microsoft.CodeAnalysis.VisualBasic.Binder:ReportCommonErrorsFromLambdas(Microsoft.CodeAnalysis.ArrayBuilder`1[System.Collections.Generic.KeyValuePair`2[Microsoft.CodeAnalysis.VisualBasic.Symbol, System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.Diagnostic]]],System.Collections.Immutable.ImmutableArray`1[Microsoft.CodeAnalysis.VisualBasic.BoundExpression],Microsoft.CodeAnalysis.DiagnosticBag):bool
Microsoft.CodeAnalysis.VisualBasic.Symbols.DeclarationTreeBuilder:GetNonTypeMemberNames(Microsoft.CodeAnalysis.SyntaxList`1[Microsoft.CodeAnalysis.VisualBasic.Syntax.StatementSyntax],byref):System.String[]:this
NuGet.Frameworks.FrameworkNameProvider:AddCompatibleCandidates():this
System.Data.XmlTreeGen:HandleTable(System.Data.DataTable,System.Xml.XmlDocument,System.Xml.XmlElement,bool):System.Xml.XmlElement:this
System.DirectoryServices.Protocols.DirectoryControl:TransformControls(System.DirectoryServices.Protocols.DirectoryControl[])
System.Globalization.CalendarData:NormalizeDatePattern(System.String):System.String
System.Globalization.DateTimeFormatInfoScanner:ScanDateWord(System.String):this
System.Net.Http.Http3RequestStream:BufferHeaderCollection(System.Net.Http.Headers.HttpHeaders):this
System.Security.Principal.WindowsIdentity:AddTokenClaims(System.Collections.Generic.List`1[System.Security.Claims.Claim],int,System.String):this
System.Xml.Schema.KeySequence:GetHashCode():int:this

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BruceForstall BruceForstall merged commit f28904a into dotnet:main Jul 1, 2022
@BruceForstall BruceForstall deleted the HoistImprovements branch July 1, 2022 23:45
@ghost ghost locked as resolved and limited conversation to collaborators Aug 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants