-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[mono] Reenable some amd64 intrinsic tests, enable amd64 ISA extensions when AOTing, several intrinsics fixes #53752
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
Tagging subscribers to this area: @directhex Issue Details
|
vargaz
approved these changes
Jun 5, 2021
580a326
to
a90ccef
Compare
Use a dummy never-supported intrinsic group as a default fallback, instead of adding a special-case "intrinsic group not present" branch Correctly intrinsify get_IsSupported even when not using LLVM Fixes spurious `System.PlatformNotSupportedException`s when calling `get_IsSupported` when the LLVM backend isn't being used.
Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD Fixes `Regressions.coreclr/GitHub_34094/Test34094`
LoadAndDuplicateToVector128 should load exactly one 8-byte value from memory before broadcasting it into both lanes in a 128-bit result vector. Fixes `JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r`
As with shuffles, the fallback jump table should probably be kept out of line someday; `vdpps` uses 6 bytes of space, so any fallback jump table for the selection control mask will be at least 1.5kb large. Fixes `JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r`
Fixes `JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r`
The usual: big jump blobs should be out of line, possible to use extract/insertelement
7dd61e8
to
7fcab9c
Compare
Removes `OP_EXTRACT_U1` and `OP_EXTRACT_U2`. Instead, sign/zero extension is determined via `inst_c1` for `OP_EXTRACT_*` and `OP_XEXTRACT_*` (and `OP_EXTRACTX_U2`, which doesn't seem to be generated as part of intrinsic translation), which must be set to a MonoTypeEnum. Replaces `OP_EXTRACT_VAR_*` with `OP_XEXTRACT_*`. Fixes `JIT/Regression/JitBlue/GitHub_23159/GitHub_23159` and `JIT/Regression/JitBlue/GitHub_13568/GitHub_13568`.
… with mono LLVM AOT
b70d3bc
to
48dd2d8
Compare
48dd2d8
to
f525f80
Compare
This was referenced Jun 13, 2021
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes:
Consolidate SSE shuffle constant unrolling
Remove
OP_SSE2_SHUFFLE
, which is unused.Rename
OP_SSE_SHUFFLE
toOP_SSE_SHUFPS
, to make this more consistent withthe naming convention used for other SSE shuffles.
Use
immediate_unroll_*
instead of hand-writing branch emission. Thesebranch tables are huge (in the simplest case, with 256 different constant
values, we can spend over 1KB of code on nothing but
shufps
andjmp
s, andthe cost gets worse if any tail duplication happens), and are currently
emitted inline. Future work ought to:
use a sequence of extractelement/insertelement instructions, which can be
optimized into a constant shuffle when the shuffle control parameter is
constant, and otherwise generates a high-latency but low-code-size fallback
(note that this only works for shuffles); or
emit the fallback branch tables out of line and use
llvm.is.constant
togenerate either a constant shuffle or a call to a fallback shuffle branch
table function (the cost isn't too bad: a direct-call/ret pair would add ~4-5
cycles and eat an RSB slot on top of the cost of the branch table).
Fixes
JIT/HardwareIntrinsics/X86/Regression/GitHub_21855/GitHub_21855_r
.Fix intrinsification for MathF.Round
OP_SSE41_ROUNDS
takes two source registers, not one.TODO: Investigate what happens with
llvm.round
andllvm.experimental.constrained.round
.Fixes
JIT/Intrinsics/MathRoundSingle_r
,JIT/Math/Functions/Functions_r
, andJIT/Performance/CodeQuality/Math/Functions/Functions
.Clean up intrinsic group lookup
Use a dummy never-supported intrinsic group as a default fallback, instead of
adding a special-case "intrinsic group not present" branch
Correctly intrinsify get_IsSupported even when not using LLVM
Fixes spurious
System.PlatformNotSupportedException
s when callingget_IsSupported
when the LLVM backend isn't being used.The "not" SSE comparions are unordered, so use the appropriate unordered LLVM
IR comparisons
Add labeled constants for the immediate parameter we pass to CMPSS/CMPSD.
Fixes
Regressions.coreclr/GitHub_34094/Test34094
.Fix
LoadAndDuplicateToVector128
LoadAndDuplicateToVector128
should load exactly one 8-byte value from memorybefore broadcasting it into both lanes in a 128-bit result vector.
Fixes
JIT/HardwareIntrinsics/X86/Sse3/LoadAndDuplicateToVector128_r
.Implement constant unrolling for
Sse41.DotProduct
As with shuffles, the fallback jump table should probably be kept out of line
someday;
vdpps
uses 6 bytes of space, so any fallback jump table for theselection control mask will be at least 1.5kb large.
Fixes
JIT/HardwareIntrinsics/X86/Sse41/DotProduct_r
.Implement constant unrolling for
Sse41.Blend
The usual: big jump blobs should be out of line, possible to use
extract/insertelement.
Zero is part of the domain of
lzcnt
and shouldn't yield an undef.Use fully-defined
llvm.ctlz
when implementingOP_LZCNT32/64
.Fixes
JIT/HardwareIntrinsics/X86/Regression/GitHub_21666/GitHub_21666_r
Unify amd64/arm64 vector extraction handling
Removes
OP_EXTRACT_U1
andOP_EXTRACT_U2
. Instead, sign/zero extension isdetermined via
inst_c1
forOP_EXTRACT_*
andOP_XEXTRACT_*
(andOP_EXTRACTX_U2
, which doesn't seem to be generated as part of intrinsictranslation), which must be set to a MonoTypeEnum.
Replaces
OP_EXTRACT_VAR_*
withOP_XEXTRACT_*
.Fixes
JIT/Regression/JitBlue/GitHub_23159/GitHub_23159
andJIT/Regression/JitBlue/GitHub_13568/GitHub_13568
.Remove
OP_DPPS
; it is unusedDisable
JIT/Regression/CLR-x86-JIT/V1.1-M1-Beta1/b143840
when running with mono LLVM AOTDisable finalizearray when running with mono LLVM AOT
Disable Vector256_1/Vector128_1 tests on wasm
Enable sse4.2, popcnt, lzcnt, bmi, and bmi2 when AOT compiling the runtime
tests.
Pass the runtime variant to
helixpublishwitharcade.proj
, and forward thisruntime variant to testenvironment.proj.
This is used to selectively enable LLVM JIT on the LLVM AOT lanes. Removes
the hack added to CLRTest.Execute.Bash.targets that did this for arm64 (which
happens to only have an LLVM AOT lane for runtime tests right now).
Enable
JIT/HardwareIntrinsics/General/Vector128_1/**
,JIT/HardwareIntrinsics/General/Vector256/**
,JIT/HardwareIntrinsics/General/Vector256_1/**
, andJIT/HardwareIntrinsics/X86/General/IsSupported*/**
for LLVM AOT on amd64.