[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

imhameed · 2021-06-05T00:31:23Z

Changes:

Consolidate SSE shuffle constant unrolling

Remove OP_SSE2_SHUFFLE, which is unused.

Rename OP_SSE_SHUFFLE to OP_SSE_SHUFPS, to make this more consistent with
the naming convention used for other SSE shuffles.

Use immediate_unroll_* instead of hand-writing branch emission. These
branch tables are huge (in the simplest case, with 256 different constant
values, we can spend over 1KB of code on nothing but shufps and jmps, and
the cost gets worse if any tail duplication happens), and are currently
emitted inline. Future work ought to:
1. use a sequence of extractelement/insertelement instructions, which can be
  optimized into a constant shuffle when the shuffle control parameter is
  constant, and otherwise generates a high-latency but low-code-size fallback
  (note that this only works for shuffles); or
2. emit the fallback branch tables out of line and use llvm.is.constant to
  generate either a constant shuffle or a call to a fallback shuffle branch
  table function (the cost isn't too bad: a direct-call/ret pair would add ~4-5
  cycles and eat an RSB slot on top of the cost of the branch table).
Fixes JIT/HardwareIntrinsics/X86/Regression/GitHub_21855/GitHub_21855_r.
Fix intrinsification for MathF.Round

OP_SSE41_ROUNDS takes two source registers, not one.

TODO: Investigate what happens with llvm.round and
llvm.experimental.constrained.round.

Fixes JIT/Intrinsics/MathRoundSingle_r,
JIT/Math/Functions/Functions_r, and
JIT/Performance/CodeQuality/Math/Functions/Functions.
Clean up intrinsic group lookup

Use a dummy never-supported intrinsic group as a default fallback, instead of
adding a special-case "intrinsic group not present" branch

Correctly intrinsify get_IsSupported even when not using LLVM

Fixes spurious System.PlatformNotSupportedExceptions when calling
get_IsSupported when the LLVM backend isn't being used.
The "not" SSE comparions are unordered, so use the appropriate unordered LLVM
IR comparisons

Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD.

Fixes Regressions.coreclr/GitHub_34094/Test34094.
Fix LoadAndDuplicateToVector128

LoadAndDuplicateToVector128 should load exactly one 8-byte value from memory
before broadcasting it into both lanes in a 128-bit result vector.

Fixes JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r.
Implement constant unrolling for Sse41.DotProduct

As with shuffles, the fallback jump table should probably be kept out of line
someday; vdpps uses 6 bytes of space, so any fallback jump table for the
selection control mask will be at least 1.5kb large.

Fixes JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r.
Implement constant unrolling for Sse41.Blend

The usual: big jump blobs should be out of line, possible to use
extract/insertelement.
Zero is part of the domain of lzcnt and shouldn't yield an undef.

Use fully-defined llvm.ctlz when implementing OP_LZCNT32/64.

Fixes JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r
Unify amd64/arm64 vector extraction handling

Removes OP_EXTRACT_U1 and OP_EXTRACT_U2. Instead, sign/zero extension is
determined via inst_c1 for OP_EXTRACT_* and OP_XEXTRACT_* (and
OP_EXTRACTX_U2, which doesn't seem to be generated as part of intrinsic
translation), which must be set to a MonoTypeEnum.

Replaces OP_EXTRACT_VAR_* with OP_XEXTRACT_*.

Fixes JIT/Regression/JitBlue/GitHub_23159/GitHub_23159 and
JIT/Regression/JitBlue/GitHub_13568/GitHub_13568.
Remove OP_DPPS; it is unused
Disable JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 when running with mono LLVM AOT
Disable finalizearray when running with mono LLVM AOT
Disable Vector256_1/Vector128_1 tests on wasm
Enable sse4.2, popcnt, lzcnt, bmi, and bmi2 when AOT compiling the runtime
tests.
Pass the runtime variant to helixpublishwitharcade.proj, and forward this
runtime variant to testenvironment.proj.

This is used to selectively enable LLVM JIT on the LLVM AOT lanes. Removes
the hack added to CLRTest.Execute.Bash.targets that did this for arm64 (which
happens to only have an LLVM AOT lane for runtime tests right now).
Enable JIT/HardwareIntrinsics/General/Vector128_1/**,
JIT/HardwareIntrinsics/General/Vector256/**,
JIT/HardwareIntrinsics/General/Vector256_1/**, and
JIT/HardwareIntrinsics/X86/General/IsSupported*/** for LLVM AOT on amd64.

dotnet-issue-labeler · 2021-06-05T00:31:27Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2021-06-05T00:33:34Z

Tagging subscribers to this area: @directhex
See info in area-owners.md if you want to be subscribed.

Issue Details

Author:	imhameed
Assignees:	-
Labels:	`area-Infrastructure-mono`, `runtime-mono`
Milestone:	-

fanyang-mono · 2021-06-09T13:30:19Z

@imhameed If you rebase this PR to get b13715b, you should be able to get all the logs generated by runtime tests, which could help you investigate these test failures.

Use a dummy never-supported intrinsic group as a default fallback, instead of adding a special-case "intrinsic group not present" branch Correctly intrinsify get_IsSupported even when not using LLVM Fixes spurious `System.PlatformNotSupportedException`s when calling `get_IsSupported` when the LLVM backend isn't being used.

Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD Fixes `Regressions.coreclr/GitHub_34094/Test34094`

LoadAndDuplicateToVector128 should load exactly one 8-byte value from memory before broadcasting it into both lanes in a 128-bit result vector. Fixes `JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r`

As with shuffles, the fallback jump table should probably be kept out of line someday; `vdpps` uses 6 bytes of space, so any fallback jump table for the selection control mask will be at least 1.5kb large. Fixes `JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r`

Fixes `JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r`

The usual: big jump blobs should be out of line, possible to use extract/insertelement

Removes `OP_EXTRACT_U1` and `OP_EXTRACT_U2`. Instead, sign/zero extension is determined via `inst_c1` for `OP_EXTRACT_*` and `OP_XEXTRACT_*` (and `OP_EXTRACTX_U2`, which doesn't seem to be generated as part of intrinsic translation), which must be set to a MonoTypeEnum. Replaces `OP_EXTRACT_VAR_*` with `OP_XEXTRACT_*`. Fixes `JIT/Regression/JitBlue/GitHub_23159/GitHub_23159` and `JIT/Regression/JitBlue/GitHub_13568/GitHub_13568`.

… with mono LLVM AOT

imhameed added area-Infrastructure-mono runtime-mono specific to the Mono runtime labels Jun 5, 2021

vargaz approved these changes Jun 5, 2021

View reviewed changes

imhameed force-pushed the monoamd64intrintests branch 23 times, most recently from 580a326 to a90ccef Compare June 12, 2021 07:52

imhameed added 7 commits June 12, 2021 18:39

The "not" SSE comparions are unordered

4d7b219

Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD Fixes `Regressions.coreclr/GitHub_34094/Test34094`

Fix LoadAndDuplicateToVector128

b682d35

LoadAndDuplicateToVector128 should load exactly one 8-byte value from memory before broadcasting it into both lanes in a 128-bit result vector. Fixes `JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r`

Remove OP_DPPS; it is unused

9a06323

Zero is part of the domain of lzcnt and shouldn't yield an undef

c83be20

Fixes `JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r`

Implement constant unrolling for Sse41.Blend

bfafb8d

The usual: big jump blobs should be out of line, possible to use extract/insertelement

imhameed force-pushed the monoamd64intrintests branch 2 times, most recently from 7dd61e8 to 7fcab9c Compare June 13, 2021 00:01

imhameed added 4 commits June 13, 2021 11:00

Disable JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840 when running…

d94aec3

… with mono LLVM AOT

Disable finalizearray when running with mono LLVM AOT

ddc11c5

Disable Vector256_1/Vector128_1 tests on wasm

06a7164

imhameed force-pushed the monoamd64intrintests branch 2 times, most recently from b70d3bc to 48dd2d8 Compare June 13, 2021 15:04

???

f525f80

imhameed force-pushed the monoamd64intrintests branch from 48dd2d8 to f525f80 Compare June 13, 2021 15:06

imhameed changed the title ~~[mono] Reenable some amd64 intrinsic tests and enable a bunch of ISA extensions when AOTing~~ [mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes Jun 13, 2021

imhameed marked this pull request as ready for review June 13, 2021 22:49

imhameed requested review from CoffeeFlux, EgorBo, lambdageek and SamMonoRT as code owners June 13, 2021 22:49

imhameed merged commit b8b3ef1 into dotnet:main Jun 13, 2021

ghost locked as resolved and limited conversation to collaborators Jul 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

imhameed commented Jun 5, 2021 •

edited

Loading

dotnet-issue-labeler bot commented Jun 5, 2021

ghost commented Jun 5, 2021

fanyang-mono commented Jun 9, 2021

[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752

Conversation

imhameed commented Jun 5, 2021 • edited Loading

dotnet-issue-labeler bot commented Jun 5, 2021

ghost commented Jun 5, 2021

fanyang-mono commented Jun 9, 2021

imhameed commented Jun 5, 2021 •

edited

Loading