Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64-SVE: Implement IF_SVE_ED_1A, IF_SVE_EE_1A, IF_SVE_EB_1A, IF_SVE_EC_1A #97238

Merged
merged 7 commits into from
Jan 23, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #94549. These formats include some add and mov encodings, which are among the instructions we're prioritizing here.

JitDisasm output:

mov     z0.b, #-128
mov     z1.h, #0, LSL #8
mov     z2.s, #5
mov     z3.d, #127
mov     z4.b, #0
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #0, LSL #8
sqsub   z2.s, z2.s, #1
sub     z3.d, z3.d, #128
subr    z4.b, z4.b, #255
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8
smax    z0.b, z0.b, #-128
smax    z1.h, z1.h, #127
smin    z2.s, z2.s, #-128
smin    z3.d, z3.d, #127
umax    z4.b, z4.b, #0
umax    z5.h, z5.h, #255
umin    z6.s, z6.s, #0
umin    z7.d, z7.d, #255
mul     z0.b, z0.b, #-128
mul     z1.h, z1.h, #0
mul     z2.s, z2.s, #5
mul     z3.d, z3.d, #127

cstool output:

mov   z0.b, #-0x80
mov   z1.h, #0, LSL #8
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.b, #0
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #0, LSL #8
sqsub z2.s, z2.s, #1
sub   z3.d, z3.d, #0x80
subr  z4.b, z4.b, #0xFF
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00
smax  z0.b, z0.b, #-128
smax  z1.h, z1.h, #127
smin  z2.s, z2.s, #-128
smin  z3.d, z3.d, #127
umax  z4.b, z4.b, #0
umax  z5.h, z5.h, #0xFF
umin  z6.s, z6.s, #0
umin  z7.d, z7.d, #0xFF
mul   z0.b, z0.b, #-128
mul   z1.h, z1.h, #0
mul   z2.s, z2.s, #5
mul   z3.d, z3.d, #127

Note that there are some diffs in the above outputs due to differences in how immediate values are printed:

  • cstool begins to print immediates in hex at smaller values than the JIT (e.g. 0xFF vs 255)
  • the JIT is more strict in printing left-shifted values as #imm, LSL #8, whereas cstool seems to only do it for immediate values of zero.

cc @dotnet/arm64-contrib.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 20, 2024
@ghost ghost assigned amanasifkhalid Jan 20, 2024
@ghost
Copy link

ghost commented Jan 20, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. These formats include some add and mov encodings, which are among the instructions we're prioritizing here.

JitDisasm output:

mov     z0.b, #-128
mov     z1.h, #0, LSL #8
mov     z2.s, #5
mov     z3.d, #127
mov     z4.b, #0
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #0, LSL #8
sqsub   z2.s, z2.s, #1
sub     z3.d, z3.d, #128
subr    z4.b, z4.b, #255
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8
smax    z0.b, z0.b, #-128
smax    z1.h, z1.h, #127
smin    z2.s, z2.s, #-128
smin    z3.d, z3.d, #127
umax    z4.b, z4.b, #0
umax    z5.h, z5.h, #255
umin    z6.s, z6.s, #0
umin    z7.d, z7.d, #255
mul     z0.b, z0.b, #-128
mul     z1.h, z1.h, #0
mul     z2.s, z2.s, #5
mul     z3.d, z3.d, #127

cstool output:

mov   z0.b, #-0x80
mov   z1.h, #0, LSL #8
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.b, #0
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #0, LSL #8
sqsub z2.s, z2.s, #1
sub   z3.d, z3.d, #0x80
subr  z4.b, z4.b, #0xFF
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00
smax  z0.b, z0.b, #-128
smax  z1.h, z1.h, #127
smin  z2.s, z2.s, #-128
smin  z3.d, z3.d, #127
umax  z4.b, z4.b, #0
umax  z5.h, z5.h, #0xFF
umin  z6.s, z6.s, #0
umin  z7.d, z7.d, #0xFF
mul   z0.b, z0.b, #-128
mul   z1.h, z1.h, #0
mul   z2.s, z2.s, #5
mul   z3.d, z3.d, #127

Note that there are some diffs in the above outputs due to differences in how immediate values are printed:

  • cstool begins to print immediates in hex at smaller values than the JIT (e.g. 0xFF vs 255)
  • the JIT is more strict in printing left-shifted values as #imm, LSL #8, whereas cstool seems to only do it for immediate values of zero.

cc @dotnet/arm64-contrib.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
switch (ins)
{
// TODO-SVE: Why are these different? MOV is an alias for DUP
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention this above, but should these PerfScore values be different? If so, if the instruction is INS_sve_dup in emitIns_R_I, we change it to INS_sve_mov since that is the preferred disassembly. That means the INS_sve_dup case is unreachable when determining PerfScores. I can change my logic so we set the instruction to INS_sve_dup in emitIns_R_I, and then print the instruction as a mov in emitDispInsHelp, but that will require introducing some edge case logic into the latter method, as we currently don't do anything special for printing aliased instructions.

If these PerfScores should be the same, then we can avoid that issue altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The alias isn't changing the encoding at all. All the bits in the instruction are the same, it's just the way it's printed that changes.

Looking at the table here
Mov and dup exist in multiple rows:

Broadcast logical bitmask immediate to vector DUPM, MOV 2 2 V -
Duplicate, immediate and indexed form DUP, MOV 2 2 V -
Duplicate, scalar form DUP, MOV 3 1 M0 -

I think you should be using the first one (2,2) for all of IF_SVE_EB_1A.

Was this incorrect in the boiler plate autogenerated code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thank you for clarifying. Yes, the boilerplate code has different PerfScore values for mov and dup for the IF_SVE_EB_1A format -- it looks like the generator tool mixed up the PerfScore values from different formats. I'll update this to use (2,2).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the generator tool mixed up the PerfScore values from different formats

@kunalspathak something to fix

@amanasifkhalid
Copy link
Member Author

Failures are unrelated.

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

case IF_SVE_EC_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE integer add/subtract immediate (unpredicated)
assert(insOptsScalableStandard(id->idInsOpt()));
// Size specifier must be able to fit left-shifted immediate
assert(insOptsScalableAtLeastHalf(id->idInsOpt()) || !id->idOptionalShift());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise here.

case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
assert(insOptsScalableStandard(id->idInsOpt()));
// Size specifier must be able to fit left-shifted immediate
assert(insOptsScalableAtLeastHalf(id->idInsOpt()) || !id->idOptionalShift());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if I follow why we need insOptsScalableAtLeastHalf(id->idInsOpt() here? EB_1A is either https://docsmirror.github.io/A64/2023-06/dup_z_i.html or https://docsmirror.github.io/A64/2023-06/mov_dup_z_i.html and they both take B

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that B is accepted. I believe if the immediate value is being left-shifted, the size specifier has to be at least 16 bits. The docs you linked say this: "The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0)."

cstool's behavior doesn't seem to match the documentation exactly, though. From my local testing, if the immediate is left-shifted but the specifier is B, cstool refuses to parse the instruction, even if the immediate being shifted is 0. I wrote this assert to match the behavior of cstool, such that if id->idOptionalShift is true, the size specifier cannot be B. @a74nh does this sound correct to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should go with the arm specs. did you try validating it with LATE_DISASM?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried it, and LATE_DISASM matches cstool's behavior, in that it refuses to decode the instruction if a left-shifted immediate (even 0) is used with the B size specifier. I think I'm just misinterpreting the "(excluding 0)" part of the documentation linked above, and the assert is correct in disallowing the B specifier to be used with shifted immediates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it. Let's wait for @a74nh to comment on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that B is accepted. I believe if the immediate value is being left-shifted, the size specifier has to be at least 16 bits. The docs you linked say this: "The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0)."

Yes. The optional shift is being used to specify the range -32768 to +32512. That range is only valid for widths of H or wider. Therefore, it's not valid to specify a shift with size B.

It also makes sense as a value shifted by 8 cannot fit into a B, and so would always be 0.

if the immediate is left-shifted but the specifier is B, cstool refuses to parse the instruction, even if the immediate being shifted is 0.

That makes sense. It's still trying to left shift 0 by 8, and the documentation says left shift isn't valid for B. At a microarchitecture level, it wouldn't to useful encode a special case just to allow that.

I'm happy with the asserts.

@@ -6244,14 +6353,21 @@ void emitter::emitIns_R_I(instruction ins,
assert(canEncode);
assert(fmt != IF_NONE);

instrDesc* id = emitNewInstrSC(attr, imm);
// Instructions with optional shifts need larger instrDesc to store state
instrDesc* id = optionalShift ? emitNewInstrCns(attr, imm) : emitNewInstrSC(attr, imm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please double check the TP cost for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. This is from an older run, but TP shouldn't have changed since my last push. There is a TP impact, but it's quite small, and I think it only comes from the additional branch from checking optionalShift. The larger instrDesc should only affect SVE instructions as of writing, as optionalShift is only true for the new encodings added in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 compares (probably c++ eliminates redundant comparison) happening for non-sve and that is increasing the tp. May be in future, we should have emitIns_SVE_R_I() .

Can you change this to so the frequent branch is always true.

instrDesc* id = nullptr;

if (!optionalShift)
{
 id = ...
 ...
}
else
{
 ...
  id = emitNewInstrCns(attr, imm);
  id->idOptionalShift(hasShift);
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The updated diffs look the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we might eventually have to move to emitIns_sve_, but lets hold off on that for now.

@kunalspathak
Copy link
Member

Assembly diffs

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch +0.01%
benchmarks.run_tiered.linux.arm64.checked.mch +0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries.crossgen2.linux.arm64.checked.mch +0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.01%
MinOpts (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.03%
benchmarks.run_pgo.linux.arm64.checked.mch +0.02%
benchmarks.run_tiered.linux.arm64.checked.mch +0.02%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries.crossgen2.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.03%
realworld.run.linux.arm64.checked.mch +0.04%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.01%
FullOpts (+0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch +0.01%
benchmarks.run_tiered.linux.arm64.checked.mch +0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries.crossgen2.linux.arm64.checked.mch +0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
libraries_tests.run.linux.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_pgo.osx.arm64.checked.mch +0.01%
benchmarks.run_tiered.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries.crossgen2.osx.arm64.checked.mch +0.01%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.01%
realworld.run.osx.arm64.checked.mch +0.01%
MinOpts (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_pgo.osx.arm64.checked.mch +0.02%
benchmarks.run_tiered.osx.arm64.checked.mch +0.02%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries.crossgen2.osx.arm64.checked.mch +0.02%
libraries_tests.run.osx.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.03%
realworld.run.osx.arm64.checked.mch +0.04%
FullOpts (+0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_pgo.osx.arm64.checked.mch +0.01%
benchmarks.run_tiered.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries.crossgen2.osx.arm64.checked.mch +0.01%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.01%
realworld.run.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_pgo.windows.arm64.checked.mch +0.01%
benchmarks.run_tiered.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.01%
realworld.run.windows.arm64.checked.mch +0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.01%
MinOpts (+0.00% to +0.04%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_pgo.windows.arm64.checked.mch +0.02%
benchmarks.run_tiered.windows.arm64.checked.mch +0.02%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries.crossgen2.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.03%
realworld.run.windows.arm64.checked.mch +0.04%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.01%
FullOpts (+0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_pgo.windows.arm64.checked.mch +0.01%
benchmarks.run_tiered.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%
libraries_tests.run.windows.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.01%
realworld.run.windows.arm64.checked.mch +0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.01%

Details here

{
code = emitInsCodeSve(ins, fmt);
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd
code_t imm8 = (code_t)(emitGetInsSC(id) & 0xFF); // iiiiiiii
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given everything else is in functions, then this should be moved into a new helper function:

code_t imm8 = (code_t)(emitGetInsSC(id) & 0xFF);
code |= (imm8 << 5);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, fixed.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@amanasifkhalid amanasifkhalid merged commit bf10f73 into dotnet:main Jan 23, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-add branch January 23, 2024 22:54
@ryujit-bot
Copy link

Diff results for #97238

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.27% to +0.64%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.29%
benchmarks.run_pgo.linux.arm64.checked.mch +0.34%
benchmarks.run_tiered.linux.arm64.checked.mch +0.64%
coreclr_tests.run.linux.arm64.checked.mch +0.56%
libraries.crossgen2.linux.arm64.checked.mch +0.44%
libraries.pmi.linux.arm64.checked.mch +0.30%
libraries_tests.run.linux.arm64.Release.mch +0.44%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.32%
realworld.run.linux.arm64.checked.mch +0.29%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.27%
MinOpts (+0.70% to +1.21%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.96%
benchmarks.run_pgo.linux.arm64.checked.mch +0.96%
benchmarks.run_tiered.linux.arm64.checked.mch +0.97%
coreclr_tests.run.linux.arm64.checked.mch +0.91%
libraries.crossgen2.linux.arm64.checked.mch +0.99%
libraries.pmi.linux.arm64.checked.mch +0.70%
libraries_tests.run.linux.arm64.Release.mch +0.98%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.97%
realworld.run.linux.arm64.checked.mch +1.21%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.84%
FullOpts (+0.25% to +0.44%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.29%
benchmarks.run_pgo.linux.arm64.checked.mch +0.26%
benchmarks.run_tiered.linux.arm64.checked.mch +0.26%
coreclr_tests.run.linux.arm64.checked.mch +0.29%
libraries.crossgen2.linux.arm64.checked.mch +0.44%
libraries.pmi.linux.arm64.checked.mch +0.30%
libraries_tests.run.linux.arm64.Release.mch +0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.30%
realworld.run.linux.arm64.checked.mch +0.28%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.27%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.28% to +0.58%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.28%
benchmarks.run_pgo.osx.arm64.checked.mch +0.40%
benchmarks.run_tiered.osx.arm64.checked.mch +0.58%
coreclr_tests.run.osx.arm64.checked.mch +0.55%
libraries.crossgen2.osx.arm64.checked.mch +0.44%
libraries.pmi.osx.arm64.checked.mch +0.30%
libraries_tests.run.osx.arm64.Release.mch +0.50%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.32%
realworld.run.osx.arm64.checked.mch +0.28%
MinOpts (+0.70% to +1.21%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +1.04%
benchmarks.run_pgo.osx.arm64.checked.mch +0.98%
benchmarks.run_tiered.osx.arm64.checked.mch +0.99%
coreclr_tests.run.osx.arm64.checked.mch +0.89%
libraries.crossgen2.osx.arm64.checked.mch +0.99%
libraries.pmi.osx.arm64.checked.mch +0.70%
libraries_tests.run.osx.arm64.Release.mch +0.98%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.97%
realworld.run.osx.arm64.checked.mch +1.21%
FullOpts (+0.25% to +0.44%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.28%
benchmarks.run_pgo.osx.arm64.checked.mch +0.25%
benchmarks.run_tiered.osx.arm64.checked.mch +0.26%
coreclr_tests.run.osx.arm64.checked.mch +0.29%
libraries.crossgen2.osx.arm64.checked.mch +0.44%
libraries.pmi.osx.arm64.checked.mch +0.30%
libraries_tests.run.osx.arm64.Release.mch +0.25%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.30%
realworld.run.osx.arm64.checked.mch +0.28%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.27% to +0.57%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.27%
benchmarks.run_pgo.windows.arm64.checked.mch +0.35%
benchmarks.run_tiered.windows.arm64.checked.mch +0.57%
coreclr_tests.run.windows.arm64.checked.mch +0.56%
libraries.crossgen2.windows.arm64.checked.mch +0.44%
libraries.pmi.windows.arm64.checked.mch +0.30%
libraries_tests.run.windows.arm64.Release.mch +0.50%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.32%
realworld.run.windows.arm64.checked.mch +0.29%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.27%
MinOpts (+0.70% to +1.21%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +1.04%
benchmarks.run_pgo.windows.arm64.checked.mch +0.97%
benchmarks.run_tiered.windows.arm64.checked.mch +0.99%
coreclr_tests.run.windows.arm64.checked.mch +0.90%
libraries.crossgen2.windows.arm64.checked.mch +0.99%
libraries.pmi.windows.arm64.checked.mch +0.70%
libraries_tests.run.windows.arm64.Release.mch +0.98%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.97%
realworld.run.windows.arm64.checked.mch +1.21%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.83%
FullOpts (+0.25% to +0.44%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.27%
benchmarks.run_pgo.windows.arm64.checked.mch +0.25%
benchmarks.run_tiered.windows.arm64.checked.mch +0.26%
coreclr_tests.run.windows.arm64.checked.mch +0.29%
libraries.crossgen2.windows.arm64.checked.mch +0.44%
libraries.pmi.windows.arm64.checked.mch +0.30%
libraries_tests.run.windows.arm64.Release.mch +0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.30%
realworld.run.windows.arm64.checked.mch +0.28%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.27%

Details here


@github-actions github-actions bot locked and limited conversation to collaborators Feb 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants