Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant mov instructions generated #49535

Closed
jamescourtney opened this issue Mar 12, 2021 · 3 comments · Fixed by #54075
Closed

Redundant mov instructions generated #49535

jamescourtney opened this issue Mar 12, 2021 · 3 comments · Fixed by #54075
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@jamescourtney
Copy link

jamescourtney commented Mar 12, 2021

Description

I maintain a tool that generates performance-critical code, so I spend some time looking at JIT dumps to make sure the code that I'm generating is JIT-friendly (bounds-check eliding, etc).

I am by no means an authority in assembly, but I've come across something that mystifies me and makes me think there may be a bug with some redundant instructions.

My configuration is:

  • .NET 5.0 x64 on Win10 x64, fully updated
  • AMD Ryzen 5950x (Zen 3, AVX2 supported)

I'm unable to get a succinct reproduction of the issue (it may only happen when the method uses more variables than there are registers), so I'll share high-level details and add a link to a gist later. The code is basically trying to write an aligned integer at a 4-byte boundary:

checked
{
  int offset;
   ...
   // align to 4 byte boundary
   offset += (-offset) & (sizeof(int) - 1);
   int location = offset;
   offset += sizeof(int);

   // write the value at location.
   ...
}
G_M47529_IG13:
       8BC3                 mov      eax, ebx
       F7D8                 neg      eax
       83E003               and      eax, 3
       03C3                 add      eax, ebx
       0F80A3000000         jo       G_M47529_IG25
       8BD8                 mov      ebx, eax

       // These two mov's appear entirely redundant.
       8BC3                 mov      eax, ebx
       8BD8                 mov      ebx, eax

       83C304               add      ebx, 4
       0F8094000000         jo       G_M47529_IG25
       8B5708               mov      edx, dword ptr [rdi+8]
       3BC2                 cmp      eax, edx
       0F878E000000         ja       G_M47529_IG26

Unless I'm very mistaken (always possible 🙂), My question is that after the first mov, we know for sure that the eax and ebx registers carry the same value, which should mean that the next two instructions could be eliminated with no adverse side effects, right?

The full repro is available here. I've tried to remove all of the code that isn't essential to view the issue: https://gist.github.com/jamescourtney/5000bec600493d7dc08633a07ba2dd06

@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Mar 12, 2021
@SingleAccretion
Copy link
Contributor

This is the result of value numbering not folding checked (aka ovfl) operations, same as in #1115 and #32776.

Here are some of the relevant trees:

***** BB03, STMT00002(before)
N005 (  6,  6) [000013] -A-X----R---              *  ASG       int   
N004 (  1,  1) [000012] D------N----              +--*  LCL_VAR   int    V04 loc1         d:2
N003 (  6,  6) [000011] ---X--------              \--*  ADD_ovfl  int   
N001 (  1,  1) [000009] ------------                 +--*  LCL_VAR   int    V03 loc0         u:2
N002 (  1,  1) [000010] ------------                 \--*  CNS_INT   int    4

N001 [000009]   LCL_VAR   V03 loc0         u:2 => $40 {IntCns 0}
N002 [000010]   CNS_INT   4 => $45 {IntCns 4}
N003 [000011]   ADD_ovfl  => $343 {norm=$342 {ADD_OVF($40, $45)}, exc=$380 {OverflowExc($342)}}
N004 [000012]   LCL_VAR   V04 loc1         d:2 => $342 {ADD_OVF($40, $45)}
N005 [000013]   ASG       => $343 {norm=$342 {ADD_OVF($40, $45)}, exc=$380 {OverflowExc($342)}}

***** BB03, STMT00002(after)
N005 (  6,  6) [000013] -A-X----R---              *  ASG       int    $343
N004 (  1,  1) [000012] D------N----              +--*  LCL_VAR   int    V04 loc1         d:2 $342
N003 (  6,  6) [000011] ---X--------              \--*  ADD_ovfl  int    $343
N001 (  1,  1) [000009] ------------                 +--*  LCL_VAR   int    V03 loc0         u:2 $40
N002 (  1,  1) [000010] ------------                 \--*  CNS_INT   int    4 $45

***** BB04, STMT00053(before)
N008 (  9,  9) [000236] -A-X----R---              *  ASG       int   
N007 (  1,  1) [000235] D------N----              +--*  LCL_VAR   int    V04 loc1         d:10
N006 (  9,  9) [000234] ---X--------              \--*  ADD_ovfl  int   
N004 (  4,  4) [000382] ------------                 +--*  AND       int   
N002 (  2,  2) [000378] ------------                 |  +--*  NEG       int   
N001 (  1,  1) [000230] ------------                 |  |  \--*  LCL_VAR   int    V04 loc1         u:2
N003 (  1,  1) [000381] ------------                 |  \--*  CNS_INT   int    3
N005 (  1,  1) [000229] ------------                 \--*  LCL_VAR   int    V04 loc1         u:2 (last use)

N001 [000230]   LCL_VAR   V04 loc1         u:2 => $342 {ADD_OVF($40, $45)}
N002 [000378]   NEG       => $480 {NEG($342)}
N003 [000381]   CNS_INT   3 => $4c {IntCns 3}
N004 [000382]   AND       => $345 {AND($4c, $480)}
N005 [000229]   LCL_VAR   V04 loc1         u:2 (last use) => $342 {ADD_OVF($40, $45)}
N006 [000234]   ADD_ovfl  => $347 {norm=$346 {ADD_OVF($342, $345)}, exc=$388 {OverflowExc($346)}}
N007 [000235]   LCL_VAR   V04 loc1         d:10 => $346 {ADD_OVF($342, $345)}
N008 [000236]   ASG       => $347 {norm=$346 {ADD_OVF($342, $345)}, exc=$388 {OverflowExc($346)}}

***** BB04, STMT00053(after)
N008 (  9,  9) [000236] -A-X----R---              *  ASG       int    $347
N007 (  1,  1) [000235] D------N----              +--*  LCL_VAR   int    V04 loc1         d:10 $346
N006 (  9,  9) [000234] ---X--------              \--*  ADD_ovfl  int    $347
N004 (  4,  4) [000382] ------------                 +--*  AND       int    $345
N002 (  2,  2) [000378] ------------                 |  +--*  NEG       int    $480
N001 (  1,  1) [000230] ------------                 |  |  \--*  LCL_VAR   int    V04 loc1         u:2 $342
N003 (  1,  1) [000381] ------------                 |  \--*  CNS_INT   int    3 $4c
N005 (  1,  1) [000229] ------------                 \--*  LCL_VAR   int    V04 loc1         u:2 (last use) $342

As can be seen, if we assigned a constant VN to V04 instead of the ADD_OVF(0, 4), we could then propagate the constants properly in STMT00053. Unfortunately, we would still have the unreachable CORINFO_HELP_OVERFLOW call in the codegen as its block is marked as keep, but that can hopefully be dealt with as well (presumably similarly to how blocks with CORINFO_HELP_RNGCHKFAIL are eliminated).

I will work on this.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Mar 12, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Mar 12, 2021
@AndyAyersMS
Copy link
Member

@SingleAccretion you may also find this interesting dotnet/coreclr#21959.

@benaadams
Copy link
Member

Might also have become more important to remove the redundant moves for Intel https://twitter.com/uops_info/status/1367961302845386752

I just noticed that with a recent microcode update, #Intel Ice Lake CPUs don't perform 'move elimination' any more for GPR registers. This is probably related to the ICL065 erratum

https://twitter.com/uops_info/status/1368216605688205312

Yes, it also affects Tiger Lake:

ICL065 erratum says:

image

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 11, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jun 13, 2021
@SingleAccretion SingleAccretion removed their assignment Jun 13, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 13, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants