Move ternlog recognition to lower #86228

EgorBo · 2023-05-14T22:53:12Z

Move ternlog insertion from importvectorization.cpp to lower to also handle user's case + LowerMemcmp vectorization. Handling for other cases where we can insert ternlog is left up for grabs.

As a bonus, it now handles unrollings like these:

bool Test(int[] a, int[] b) => 
    a.AsSpan(0, 10).SequenceEqual(b);

       vmovups  ymm0, ymmword ptr [rcx]
       vmovups  ymm1, ymmword ptr [rax]
       vmovups  ymm2, ymmword ptr [rcx+08H]
       vmovups  ymm3, ymmword ptr [rax+08H]
       vpxor    ymm0, ymm0, ymm1
-      vpxor    ymm1, ymm2, ymm3
-      vpor     ymm0, ymm0, ymm1
+      vpternlogq ymm0, ymm2, ymm3, -10
       vptest   ymm0, ymm0
       sete     al
       movzx    rax, al

ghost · 2023-05-14T22:53:26Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Move ternlog insertion from importvectorization.cpp to lower to also handle user's case + LowerMemcmp vectorization. Handling for other cases where we can insert ternlog is left up for grabs.

As a bonus, it now handles unrollings like these:

bool Test(int[] a, int[] b) => 
    a.AsSpan(0, 10).SequenceEqual(b);

       vmovups  ymm0, ymmword ptr [rcx]
       vmovups  ymm1, ymmword ptr [rax]
       vmovups  ymm2, ymmword ptr [rcx+08H]
       vmovups  ymm3, ymmword ptr [rax+08H]
       vpxor    ymm0, ymm0, ymm1
-      vpxor    ymm1, ymm2, ymm3
-      vpor     ymm0, ymm0, ymm1
+      vpternlogq ymm0, ymm2, ymm3, -10
       vptest   ymm0, ymm0
       sete     al
       movzx    rax, al

Author:	EgorBo
Assignees:	EgorBo
Labels:	`area-CodeGen-coreclr`
Milestone:	-

tannergooding · 2023-05-15T14:40:33Z

src/coreclr/jit/lowerxarch.cpp

+        case NI_SSE2_Or:
+        case NI_AVX2_Or:
+        case NI_AVX512F_Or:
+        case NI_SSE2_Xor:
+        case NI_AVX2_Xor:
+        case NI_AVX512F_Xor:
+        {
+            GenTree* ternLogic = RecognizeHWTernaryLog(node);
+            if (ternLogic != nullptr)
+            {
+                // Revisit this node again.
+                return ternLogic->gtPrev;
+            }
+            break;
+        }


What's the primary motivation for it being in lowering rather than morph? Most of our pattern based transform are recognized in morph and done there instead (including ROL/ROR, AND_NOT, etc).

@tannergooding mainly, two reasons:

Handle LowerMemCmp which is also in lower

Handle user patterns with explicit intrinsics (since at this point we'll lower all cross-plat helpers to direct intrinsics)

For 1, it seems like it'd be easier to just special case it. That's what we do for most of our other patterns as well. It might be even better if we had a way to share most of the necessary logic between morph/lowering, that way we can get the early ones (which should also help TP) and also avoid missing any of the late ones; but that's not something we're doing elsewhere atm.

For 2, I don't see how that's impacted. We'll be recognizing the explicit intrinsics either way whether in morph or lowering, right?

tannergooding · 2023-05-15T15:15:37Z

src/coreclr/jit/lowerxarch.cpp

+    if (IsHWBitwiseOp(node, &oper1))
+    {
+        if (IsHWBitwiseOp(op2, &oper2))
+        {
+            if ((oper1 == GT_OR) && (oper2 == GT_XOR))
+            {
+                mask = static_cast<uint8_t>(0xF0 | (0xCC ^ 0xAA));
+            }
+            else if ((oper1 == GT_XOR) && (oper2 == GT_OR))
+            {
+                mask = static_cast<uint8_t>(0xF0 ^ (0xCC | 0xAA));
+            }
+        }
+    }


This is going to become untenable as we need to expand to the full set of operations.

It's also not accounting for things like containment or other factors that can influence which of the variants are best for a given operation and where the operands are coming from (e.g. a | (b ^ [c]) vs a | (c ^ [b]) vs (c ^ b) | [a])

I expect instead we need to have something more like an iterative approach. That is, we first get the two bitwise operations being combined and the three operands. We then validate early that bitwiseOper1 is containable by bitwiseOper2 (or vice versa). Next do the more expensive work (optimized code only) to determine optimal order (which operand should be RMW, which should be contained). Finally we pass the operands into a small helper with the operation being done:

mask = BuildTernaryLogicMask(bitOper1, maskOp1, maskOp2); mask = BuildTernaryLogicMask(bitOper2, mask, maskOp3);

That avoids the duplication and centralizes the combining logic. The total set of operations supported are:

true: AllBitsSet

false: Zero

not: ~value

and: left & right

nand: ~(left & right)

or: left | right

nor: ~(left | right)

xor: left ^ right

xnor: ~(left ^ right)

cndsel: a ? b : c; aka (B & A) | (C & ~A)

major: 0 if two+ input bits are 0

minor: 1 if two+ input bits are 0

EgorBo · 2023-05-17T21:14:13Z

@tannergooding I realized that it's more complicated than I can spend my time on this and next weeks, so let's then just handle the LowerCallMemcmp case via special-casing since it won't be covered by morph anyway - I've just merged it

EgorBo · 2023-05-18T12:47:26Z

low diffs from the Lower part so unlikely worth the effort. Although, we do need to remove ternary from importervectorizations.cpp where it doesn't belong and uglifies cross-plat code (I don't believe it improves TP by any visible value)

ghost assigned EgorBo May 14, 2023

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 14, 2023

EgorBo added the avx512 Related to the AVX-512 architecture label May 14, 2023

tannergooding reviewed May 15, 2023

View reviewed changes

build-analysis bot mentioned this pull request May 15, 2023

Unable to load Analyzer assembly .../Microsoft.CodeAnalysis.Analyzers.dll : Not a valid assembly #85082

Closed

Handle LowerMemCmp

2962414

EgorBo force-pushed the more-ternlog branch from db77737 to 2962414 Compare May 17, 2023 21:12

EgorBo added 4 commits May 17, 2023 23:16

oops. the other way around

c0bd76f

Update lower.cpp

bbd36b8

Update lower.cpp

d82433f

Update lower.cpp

6b4e8ae

EgorBo closed this May 18, 2023

ghost locked as resolved and limited conversation to collaborators Jun 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move ternlog recognition to lower #86228

Move ternlog recognition to lower #86228

EgorBo commented May 14, 2023

ghost commented May 14, 2023

tannergooding May 15, 2023

EgorBo May 15, 2023

tannergooding May 15, 2023 •

edited

Loading

tannergooding May 15, 2023

EgorBo commented May 17, 2023 •

edited

Loading

EgorBo commented May 18, 2023 •

edited

Loading

Move ternlog recognition to lower #86228

Move ternlog recognition to lower #86228

Conversation

EgorBo commented May 14, 2023

ghost commented May 14, 2023

tannergooding May 15, 2023

Choose a reason for hiding this comment

EgorBo May 15, 2023

Choose a reason for hiding this comment

tannergooding May 15, 2023 • edited Loading

Choose a reason for hiding this comment

tannergooding May 15, 2023

Choose a reason for hiding this comment

EgorBo commented May 17, 2023 • edited Loading

EgorBo commented May 18, 2023 • edited Loading

tannergooding May 15, 2023 •

edited

Loading

EgorBo commented May 17, 2023 •

edited

Loading

EgorBo commented May 18, 2023 •

edited

Loading