Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Propagate assertions into natural loops in global morph #110501

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Dec 7, 2024

Inspired by #109190, propagate assertions into natural loops in global morph as well, but don't publish the state.
Reuse the m_loop computed before.

Also, track vector count flag in assertions becuase we can propagate a vector count constant into a loop bound to enable unrolling now.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 7, 2024
@hez2010 hez2010 changed the title Propagate assertions into natural loops in global morph JIT: Propagate assertions into natural loops in global morph Dec 7, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Dec 7, 2024
@hez2010

This comment was marked as outdated.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 7, 2024

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 7, 2024

Example diffs:

var config = new Config { MaxSize = 4 };
var sum = 0;
for (var i = 0; i < config.MaxSize; i++) sum += 42;
return sum;

struct Config
{
    public int MaxSize;
}

Before:

G_M24375_IG01:  ;; offset=0x0000
						;; size=0 bbWeight=0.25 PerfScore 0.00
G_M24375_IG02:  ;; offset=0x0000
       xor      eax, eax
       mov      ecx, 4
						;; size=7 bbWeight=0.25 PerfScore 0.12
G_M24375_IG03:  ;; offset=0x0007
       add      eax, 42
       dec      ecx
       jne      SHORT G_M24375_IG03
						;; size=7 bbWeight=4 PerfScore 6.00
G_M24375_IG04:  ;; offset=0x000E
       ret      
						;; size=1 bbWeight=1 PerfScore 1.00

After:

G_M24375_IG01:  ;; offset=0x0000
                                                ;; size=0 bbWeight=1 PerfScore 0.00
G_M24375_IG02:  ;; offset=0x0000
       mov      eax, 168
                                                ;; size=5 bbWeight=1 PerfScore 0.25
G_M24375_IG03:  ;; offset=0x0005
       ret
                                                ;; size=1 bbWeight=1 PerfScore 1.00

Because now icon 4 gets propagated into the loop bound, which allows unrolling to kick in.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 7, 2024

How can the library tests fail only on Linux-musl-arm while passing on all other targets? Does this ring any bells?

@EgorBo
Copy link
Member

EgorBo commented Dec 7, 2024

How can the library tests failed only on Linux-musl-arm while passed on all other targets? Does this ring any bells?

Afair, that is the only target on our CI where we build everything on 64bit host (Linux-x64) so you need to be careful with (s)size_t etc (JIT is 64bit binary prejitting for 32bit target).

@hez2010

This comment was marked as resolved.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 8, 2024

The GenTree created on x64 is:

               [000058] DACXG------                         *  STORE_LCL_VAR struct<System.Int128, 16>(P) V03 loc2
                                                            *    long   field V03._lower (fldOffset=0x0) -> V14 tmp9
                                                            *    long   field V03._upper (fldOffset=0x8) -> V15 tmp10
               [000055] --CXG------                         \--*  CALL r2r_ind struct System.Int128:op_Addition(System.Int128,System.Int128):System.Int128
               [000053] ----------- arg0                       +--*  LCL_VAR   struct<System.Int128, 16>(P) V07 tmp2
                                                               +--*    long   field V07._lower (fldOffset=0x0) -> V18 tmp13         (last use)
                                                               +--*    long   field V07._upper (fldOffset=0x8) -> V19 tmp14         (last use)
               [000054] -------N--- arg1                       \--*  LCL_VAR   struct<System.Int128, 16>(P) V06 tmp1
                                                                  *    long   field V06._lower (fldOffset=0x0) -> V16 tmp11         (last use)
                                                                  *    long   field V06._upper (fldOffset=0x8) -> V17 tmp12         (last use)

while on arm32 it's:

               [000053] S-C-G------                         *  CALL r2r_ind void   System.Int128:op_Addition(System.Int128,System.Int128):System.Int128
               [000058] D---------- retbuf                  +--*  LCL_ADDR  byref  V03 loc2         [+0]
               [000052] ----------- arg1                    +--*  LCL_VAR   struct<System.Int128, 16>(P) V06 tmp1
                                                            +--*    long   field V06._lower (fldOffset=0x0) -> V09 tmp4          (last use)
                                                            +--*    long   field V06._upper (fldOffset=0x8) -> V10 tmp5          (last use)
               [000056] ----------- arg2                    \--*  LCL_VAR   struct<System.Int128, 16>(RB) V07 tmp2          (last use)

So we end up not clearing assertions for V03 which leads to bad codegen.
It seems that we also need to clear assertions for LCL_ADDR in the pred if the user has any side effects while doing cross-block assertion propagation.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 8, 2024

Diffs

@jakobbotsch
Copy link
Member

This is the same as #101763. I would pick that PR up and work off that as it had correctness fixes.

It seems the same regressions still exist, so the real work here is to figure out why they happen and how to avoid them.

@hez2010 hez2010 force-pushed the global-morph-natural-loop branch from d84fc00 to dc8e022 Compare December 9, 2024 15:15
@hez2010
Copy link
Contributor Author

hez2010 commented Dec 9, 2024

@MihuBot

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 9, 2024

New diffs after I increase the assertions limit to 2x

Now regressions are mostly caused by newly unrolled loops, despite that the diff are majorly improvements.

So the regressions we have before were caused by more assertions being created and unfortunately hit the limit, so that some assertions not being propagated anymore. We need to tune the limit if we want to make this go in.

@EgorBo
Copy link
Member

EgorBo commented Dec 9, 2024

New diffs after I increase the assertions limit to 2x

how do diffs look like if you increase the limit without your changes? As expected, the increased limit comes with a noticeable TP regression (~0.5%)

@JulieLeeMSFT
Copy link
Member

@hez2010, the new change hides the perf regression. Please make the same change on base and rerun the diff and share with us.

New diffs after I increase the assertions limit to 2x

how do diffs look like if you increase the limit without your changes? As expected, the increased limit comes with a noticeable TP regression (~0.5%)

@hez2010
Copy link
Contributor Author

hez2010 commented Jan 6, 2025

I'm converting this into draft while finding out why the regression happens.

@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jan 6, 2025
@hez2010 hez2010 marked this pull request as draft January 6, 2025 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants