Skip to content

Commit

Permalink
[committed] Reasonably handle SUBREGs in risc-v cost modeling
Browse files Browse the repository at this point in the history
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.

This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board.  Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.

In the bt_find_func and bt_skip_func cases we have  something like this:

> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
>         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
>      (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
>      (nil))

[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
>      (nil))

Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split.  But there's a much better approach if we look at fwprop...

(set (reg:DI 142 [ _1 ])
    (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
        (reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)

So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled.  But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG.  That results in the RTL above
having too high a cost and fwprop gives up.

One approach would be to replace the REG_P  with REG_P || SUBREG_P in the
costing code.  I ultimately decided against that and instead check if the
operand in question passes register_operand.

By far the most important case to handle is the DImode PLUS.  But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well.  For
those other cases we're talking about improvements in the .000001% range.

While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector).  So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
	similarly.

gcc/testsuite/

	* gcc.target/riscv/reg_subreg_costs.c: New test.

	Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
  • Loading branch information
JeffreyALaw committed Feb 4, 2024
1 parent aa33570 commit 777df37
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 7 deletions.
18 changes: 11 additions & 7 deletions gcc/config/riscv/riscv.cc
Original file line number Diff line number Diff line change
Expand Up @@ -3055,7 +3055,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
case SET:
/* If we are called for an INSN that's a simple set of a register,
then cost based on the SET_SRC alone. */
if (outer_code == INSN && REG_P (SET_DEST (x)))
if (outer_code == INSN
&& register_operand (SET_DEST (x), GET_MODE (SET_DEST (x))))
{
riscv_rtx_costs (SET_SRC (x), mode, outer_code, opno, total, speed);
return true;
Expand Down Expand Up @@ -3172,7 +3173,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
rtx and_rhs = XEXP (x, 1);
rtx ashift_lhs = XEXP (XEXP (x, 0), 0);
rtx ashift_rhs = XEXP (XEXP (x, 0), 1);
if (REG_P (ashift_lhs)
if (register_operand (ashift_lhs, GET_MODE (ashift_lhs))
&& CONST_INT_P (ashift_rhs)
&& CONST_INT_P (and_rhs)
&& ((INTVAL (and_rhs) >> INTVAL (ashift_rhs)) == 0xffffffff))
Expand All @@ -3188,7 +3189,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
}
/* bclr pattern for zbs. */
if (TARGET_ZBS
&& REG_P (XEXP (x, 1))
&& register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1)))
&& GET_CODE (XEXP (x, 0)) == ROTATE
&& CONST_INT_P (XEXP ((XEXP (x, 0)), 0))
&& INTVAL (XEXP ((XEXP (x, 0)), 0)) == -2)
Expand Down Expand Up @@ -3344,7 +3345,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
if (TARGET_ZBA
&& (TARGET_64BIT && (mode == DImode))
&& GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
&& REG_P (XEXP (XEXP (x, 0), 0))
&& register_operand (XEXP (XEXP (x, 0), 0),
GET_MODE (XEXP (XEXP (x, 0), 0)))
&& GET_MODE (XEXP (XEXP (x, 0), 0)) == SImode)
{
*total = COSTS_N_INSNS (1);
Expand All @@ -3355,7 +3357,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
&& ((!TARGET_64BIT && (mode == SImode)) ||
(TARGET_64BIT && (mode == DImode)))
&& (GET_CODE (XEXP (x, 0)) == ASHIFT)
&& REG_P (XEXP (XEXP (x, 0), 0))
&& register_operand (XEXP (XEXP (x, 0), 0),
GET_MODE (XEXP (XEXP (x, 0), 0)))
&& CONST_INT_P (XEXP (XEXP (x, 0), 1))
&& IN_RANGE (INTVAL (XEXP (XEXP (x, 0), 1)), 1, 3))
{
Expand All @@ -3368,7 +3371,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
if (TARGET_ZBA
&& mode == word_mode
&& GET_CODE (XEXP (x, 0)) == MULT
&& REG_P (XEXP (XEXP (x, 0), 0))
&& register_operand (XEXP (XEXP (x, 0), 0),
GET_MODE (XEXP (XEXP (x, 0), 0)))
&& CONST_INT_P (XEXP (XEXP (x, 0), 1))
&& pow2p_hwi (INTVAL (XEXP (XEXP (x, 0), 1)))
&& IN_RANGE (exact_log2 (INTVAL (XEXP (XEXP (x, 0), 1))), 1, 3))
Expand All @@ -3390,7 +3394,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN
if (TARGET_ZBA
&& (TARGET_64BIT && (mode == DImode))
&& (GET_CODE (XEXP (x, 0)) == AND)
&& (REG_P (XEXP (x, 1))))
&& register_operand (XEXP (x, 1), GET_MODE (XEXP (x, 1))))
{
do {
rtx and_lhs = XEXP (XEXP (x, 0), 0);
Expand Down
15 changes: 15 additions & 0 deletions gcc/testsuite/gcc.target/riscv/reg_subreg_costs.c
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/* { dg-do compile } */
/* { dg-require-effective-target rv64 } */
/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-Os" "-Oz" } } */
/* { dg-options "-march=rv64gc_zba" } */

#include <stdint-gcc.h>
void foo(uint32_t a, uint64_t *b_ptr, uint64_t b, uint64_t *c_ptr, uint64_t c)
{
uint64_t x = a;
*b_ptr = b + x;
*c_ptr = c + x;
}

/* { dg-final { scan-assembler-not "\\szext.w\\s" } } */

0 comments on commit 777df37

Please sign in to comment.