Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Compute BB weights with higher precision, use tolerance in if-conversion #88385

Merged
merged 3 commits into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/coreclr/jit/block.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1665,9 +1665,9 @@ weight_t BasicBlock::getBBWeight(Compiler* comp)
{
weight_t calledCount = getCalledCount(comp);

// Normalize the bbWeights by multiplying by BB_UNITY_WEIGHT and dividing by the calledCount.
// Normalize the bbWeight.
//
weight_t fullResult = this->bbWeight * BB_UNITY_WEIGHT / calledCount;
weight_t fullResult = (this->bbWeight / calledCount) * BB_UNITY_WEIGHT;

return fullResult;
}
Expand Down
7 changes: 4 additions & 3 deletions src/coreclr/jit/ifconversion.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -656,10 +656,11 @@ bool OptIfConversionDsc::optIfConvert()

if (!m_comp->compStressCompile(Compiler::STRESS_IF_CONVERSION_INNER_LOOPS, 25))
{
// Don't optimise the block if it is inside a loop
// When inside a loop, branches are quicker than selects.
// Don't optimise the block if it is inside a loop. Loop-carried
// dependencies can cause significant stalls if if-converted.
// Detect via the block weight as that will be high when inside a loop.
if (m_startBlock->getBBWeight(m_comp) > BB_UNITY_WEIGHT)

if (m_startBlock->getBBWeight(m_comp) > BB_UNITY_WEIGHT * 1.05)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to stylize these comparisons, so maybe add a new fgProfileWeight... helper here?

In particular these sorts of checks are often parts of profitability heuristics which we might want to vary under stress and/or expose to some machine learning mechanism.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think playing around with this threshold makes much sense given its use as an "is inside loop" check. The precise check happens below when this check is false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really checking if the block is likely to be inside a loop, it is checking if the block is likely to be inside a frequently executed loop.

If we assume profile data is representative, then perhaps this is fine; if we think it might not be representative, then perhaps we should be checking bbNatLoopNum and if set, either bypass the conversion, or go on to compare the weight of this block to the weight of the loop entry (so either "in a loop" or "frequently executed wrt the loop"). I wonder if this would subsume the need for an optReachable check.

Do we know if the cases in #82106 did not get handled by our loop recognition? I don't recall why we didn't check this initially -- do we no longer trust the loop table? Seems like it might still be good enough.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression in #82106 was because we if-converted inside an unnatural loop that was not recognized by our loop recognition. The loop is this code:

do
{
// Make sure we don't go out of bounds
Debug.Assert(offset + ch1ch2Distance + Vector256<ushort>.Count <= searchSpaceLength);
Vector256<ushort> cmpCh2 = Vector256.Equals(ch2, Vector256.LoadUnsafe(ref ushortSearchSpace, (nuint)(offset + ch1ch2Distance)));
Vector256<ushort> cmpCh1 = Vector256.Equals(ch1, Vector256.LoadUnsafe(ref ushortSearchSpace, (nuint)offset));
Vector256<byte> cmpAnd = (cmpCh1 & cmpCh2).AsByte();
// Early out: cmpAnd is all zeros
if (cmpAnd != Vector256<byte>.Zero)
{
goto CANDIDATE_FOUND;
}
LOOP_FOOTER:
offset += Vector256<ushort>.Count;
if (offset == searchSpaceMinusValueTailLength)
return -1;
// Overlap with the current chunk for trailing elements
if (offset > searchSpaceMinusValueTailLengthAndVector)
offset = searchSpaceMinusValueTailLengthAndVector;
continue;
CANDIDATE_FOUND:
uint mask = cmpAnd.ExtractMostSignificantBits();
do
{
int bitPos = BitOperations.TrailingZeroCount(mask);
// div by 2 (shr) because we work with 2-byte chars
nint charPos = (nint)((uint)bitPos / 2);
if (valueLength == 2 || // we already matched two chars
SequenceEqual(
ref Unsafe.As<char, byte>(ref Unsafe.Add(ref searchSpace, offset + charPos)),
ref Unsafe.As<char, byte>(ref value), (nuint)(uint)valueLength * 2))
{
return (int)(offset + charPos);
}
// Clear two the lowest set bits
if (Bmi1.IsSupported)
mask = Bmi1.ResetLowestSetBit(Bmi1.ResetLowestSetBit(mask));
else
mask &= ~(uint)(0b11 << bitPos);
} while (mask != 0);
goto LOOP_FOOTER;
} while (true);

That's why we ended up with optReachable here.

There is an alternative to simply get rid of this block weight check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... let's leave this as is for now.

@BruceForstall do we have a work item for loops we ought to recognize but don't? I see #43713 so perhaps that's good enough for now.

{
JITDUMP("Skipping if-conversion inside loop (via weight)\n");
return false;
Expand Down