Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jit] Add more SSE2 opcodes. Enable SSE2. #33465

Merged
merged 8 commits into from
Mar 13, 2020
Merged

Conversation

vargaz
Copy link
Contributor

@vargaz vargaz commented Mar 11, 2020

No description provided.

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2020

Disable SSE2 support as some intrinsics are still missing.

Looks like the following are missing:

ConvertScalarToVector128Double
ConvertScalarToVector128Single
ShiftLeftLogical
ShiftRightLogical

LGTM since IsSupported is disabled

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2020

@tannergooding do you happen to know how coreclr handles intrinsics which require constant args and a variable is passed instead? e.g.

Vector128<int> Foo(Vector128<int> lhs, byte rhs)
{
    return Sse2.Shuffle(lhs, rhs);
}

@tannergooding
Copy link
Member

do you happen to know how coreclr handles intrinsics which require constant args and a variable is passed instead

If the value is constant, we just return a gtNewSimdHWIntrinsicNode immediately. Otherwise, we return nullptr which forces a standard call node to be emitted.
When it becomes time to generate code for the call, we'll see it is recursively calling itself (tracked via mustExpand) and we'll return a gtNewSimdHWIntrinsicNode node in that case anyways.
This is done to support indirect calls (delegates, reflection, debugger, etc) as well as instructions that require a constant input for their operand (such as shufps).

When it gets to codegen, we then check if the value is constant again, for example see here.
If the value is constant, codegen can happen normally and we just emit the instruction. However, if it isn't we go to genHWIntrinsicJumpTableFallback and emit a jump table containing all possible scenarios (up to 256) and hard code the constants in the table.

The fallback is slow, but it ensures things will work in various edge scenarios and devs using HWIntrinsics should be profiling so they'll catch the issue. We've previously discussed an analyzer as something that could help flag non-constant inputs to the user as well, but that hasn't been implemented yet. We also have an issue (#11062) tracking us delaying the mustExpand decision until later (such as in lowering) so we can cover more constant input cases, but that will require some more work in the JIT.

CC. @CarolEidt, @echesakovMSFT

@EgorBo
Copy link
Member

EgorBo commented Mar 12, 2020

However, if it isn't we go to genHWIntrinsicJumpTableFallback and emit a jump table containing all possible scenarios (up to 256) and hard code the constants in the table.

Thanks, so it's basically the same as what Zoltan did (not sure about the 256 limit). I see that all imm arguments are of byte type in C# API for that :) even if it's int in the C API, e.g..

        /// <summary>
        /// int _mm_extract_epi16 (__m128i a,  int immediate)
        ///   PEXTRW reg, xmm, imm8
        /// </summary>
        public static ushort Extract(Vector128<ushort> value, byte index) => Extract(value, index);

And I guess we can always emit a jump table even for a const input -- LLVM will optimize it anyway (but probably makes sense to handle const to reduce compilation time)

@tannergooding
Copy link
Member

I see that all imm arguments are of byte type in C# API for that :) even if it's int in the C API

Yes, because the underlying instruction only accepts an imm8 (hence why 256 is also the upper limit of the number of jump table scenarios).

@vargaz vargaz merged commit ed5d551 into dotnet:master Mar 13, 2020
@vargaz vargaz deleted the netcore-sse2 branch March 13, 2020 01:48
monojenkins pushed a commit to monojenkins/mono that referenced this pull request Mar 13, 2020
dotnet/runtime#33465 implemented 99% of them but a few were missing. This PR implements the missing pieces and unlocks `Sse2.IsSupported`.

@vargaz I hope I didn't step on your toe (feel free to close if you already have it locally).
vargaz pushed a commit to mono/mono that referenced this pull request Mar 14, 2020
dotnet/runtime#33465 implemented 99% of them but a few were missing. This PR implements the missing pieces and unlocks `Sse2.IsSupported`.

@vargaz I hope I didn't step on your toe (feel free to close if you already have it locally).

Co-authored-by: EgorBo <EgorBo@users.noreply.github.com>
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants