Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISCV][CostModel] Make VMV_S_* and VMV_*_S cost independent of LMUL #78739

Merged
merged 2 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions llvm/lib/Target/RISCV/RISCVTargetTransformInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,10 @@ RISCVTTIImpl::getRISCVInstructionCost(ArrayRef<unsigned> OpCodes, MVT VT,
Cost += VL;
break;
}
case RISCV::VMV_X_S:
case RISCV::VMV_S_X:
Cost += 1;
break;
default:
Cost += LMULCost;
}
Expand Down Expand Up @@ -446,7 +450,7 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,
// should be a very small constant for the constant pool load. As such,
// we may bias towards large selects slightly more than truely warranted.
return LT.first *
(2 + getRISCVInstructionCost({RISCV::VMERGE_VVM},
(1 + getRISCVInstructionCost({RISCV::VMV_S_X, RISCV::VMERGE_VVM},
LT.second, CostKind));
}
case TTI::SK_Broadcast: {
Expand Down Expand Up @@ -475,10 +479,9 @@ InstructionCost RISCVTTIImpl::getShuffleCost(TTI::ShuffleKind Kind,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Off topic for the actual change, but I think we can improve the code sequence above.

One idea would be to use a scalar multiply to do the broadcast since we know the value will either be 0 or (int16)-1.
Another would be to use a vrgather.vi in place of the extract/splat sequence.
Another would be to extract the full16 bit value, do the extract and splat in scalar, and insert a full 16 bit value back.

Not sure if you care about this case or not, I don't much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood! Let's consider revisiting it when it becomes more relevant!

return LT.first *
(TLI->getLMULCost(LT.second) + // FIXME: this should be 1 for andi
TLI->getLMULCost(
LT.second) + // FIXME: vmv.x.s is the same as extractelement
getRISCVInstructionCost({RISCV::VMV_V_I, RISCV::VMERGE_VIM,
RISCV::VMV_V_X, RISCV::VMSNE_VI},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, one thing I'm noticing. I think the example code sequence above is only for short fixed vectors. For that to work on scalable vectors, I think we need a slide in there somewhere. Or, said alternatively, that remaining LMUL scaling is probably modeling the vslidedown more than the andi for scalable vectors and we need to adjust our example and comments slightly.

(Not directly relevant to your change, more the existing code structure.)

RISCV::VMV_X_S, RISCV::VMV_V_X,
RISCV::VMSNE_VI},
LT.second, CostKind));
}

Expand Down
4 changes: 2 additions & 2 deletions llvm/test/Analysis/CostModel/RISCV/rvv-shuffle.ll
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ define void @vector_broadcast() {
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %5 = shufflevector <vscale x 4 x i32> undef, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = shufflevector <vscale x 1 x i64> undef, <vscale x 1 x i64> undef, <vscale x 1 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 2 for instruction: %7 = shufflevector <vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %8 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %8 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %9 = shufflevector <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %10 = shufflevector <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %11 = shufflevector <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i32> zeroinitializer
Expand All @@ -29,7 +29,7 @@ define void @vector_broadcast() {
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %5 = shufflevector <vscale x 4 x i32> undef, <vscale x 4 x i32> undef, <vscale x 4 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %6 = shufflevector <vscale x 1 x i64> undef, <vscale x 1 x i64> undef, <vscale x 1 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: %7 = shufflevector <vscale x 2 x i64> undef, <vscale x 2 x i64> undef, <vscale x 2 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %8 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %8 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %9 = shufflevector <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %10 = shufflevector <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %11 = shufflevector <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i32> zeroinitializer
Expand Down
12 changes: 6 additions & 6 deletions llvm/test/Analysis/CostModel/RISCV/shuffle-broadcast.ll
Original file line number Diff line number Diff line change
Expand Up @@ -45,9 +45,9 @@ define void @broadcast_scalable() #0{
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %38 = shufflevector <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %39 = shufflevector <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %40 = shufflevector <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %41 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 24 for instruction: %42 = shufflevector <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 48 for instruction: %43 = shufflevector <vscale x 64 x i1> undef, <vscale x 64 x i1> undef, <vscale x 64 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 11 for instruction: %41 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 21 for instruction: %42 = shufflevector <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 41 for instruction: %43 = shufflevector <vscale x 64 x i1> undef, <vscale x 64 x i1> undef, <vscale x 64 x i32> zeroinitializer
; CHECK-NEXT: Cost Model: Found an estimated cost of 0 for instruction: ret void
;
; SIZE-LABEL: 'broadcast_scalable'
Expand Down Expand Up @@ -92,9 +92,9 @@ define void @broadcast_scalable() #0{
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %38 = shufflevector <vscale x 2 x i1> undef, <vscale x 2 x i1> undef, <vscale x 2 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %39 = shufflevector <vscale x 4 x i1> undef, <vscale x 4 x i1> undef, <vscale x 4 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 6 for instruction: %40 = shufflevector <vscale x 8 x i1> undef, <vscale x 8 x i1> undef, <vscale x 8 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 8 for instruction: %41 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 12 for instruction: %42 = shufflevector <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 20 for instruction: %43 = shufflevector <vscale x 64 x i1> undef, <vscale x 64 x i1> undef, <vscale x 64 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 7 for instruction: %41 = shufflevector <vscale x 16 x i1> undef, <vscale x 16 x i1> undef, <vscale x 16 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 9 for instruction: %42 = shufflevector <vscale x 32 x i1> undef, <vscale x 32 x i1> undef, <vscale x 32 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 13 for instruction: %43 = shufflevector <vscale x 64 x i1> undef, <vscale x 64 x i1> undef, <vscale x 64 x i32> zeroinitializer
; SIZE-NEXT: Cost Model: Found an estimated cost of 1 for instruction: ret void
;
%zero = shufflevector <vscale x 1 x half> undef, <vscale x 1 x half> undef, <vscale x 1 x i32> zeroinitializer
Expand Down