PowerPC: Rust test fails when optimized for power9 #83362

cuviper · 2024-02-29T00:39:47Z

Here is my test case extracted from a Rust LTO build: reduced.bc.gz
(originally from the rust image crate at version 0.23.14)

We ran into a test failure in an EPEL build for CentOS Stream 9 ppc64le, which has LLVM 17.0.6. For the purpose of this report, I am using LLVM main as of commit d1f0444.

The test works fine (prints "ok") with the default CPU:

$ clang -lm reduced.bc -o test && ./test
ok
$ clang -lm reduced.bc -o test -O1 && ./test
ok

However, with -mcpu=power9, it fails at -O1:

$ clang -lm reduced.bc -o test -mcpu=power9 && ./test
ok
$ clang -lm reduced.bc -o test -mcpu=power9 -O1 && ./test
thread 'main' panicked at examples/test_bgra16.rs:16:5:
bad red channel in [0, 255, 0]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

At @nikic's suggestion, I tried the optimized build with -debug-counter=dagcombine-count=0, and that ran fine. Then I used bisect-skip-count to narrow the failure to -debug-counter=dagcombine-skip=1463229,dagcombine-count=201. I hope that's a small enough range that someone who knows dagcombine and/or powerpc better can inspect the problem!

I was also trying to reduce the testcase further with llvm-reduce, but I think it quickly got into UB, because even when I verified with an -O0 build, that got into the weeds where the same build would sometimes print "ok" and sometimes crash.

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-02-29T00:53:48Z

@llvm/issue-subscribers-backend-powerpc

Author: Josh Stone (cuviper)

Here is my test case extracted from a Rust LTO build: [reduced.bc.gz](https://github.com/llvm/llvm-project/files/14441197/reduced.bc.gz) (originally from the rust `image` crate at version 0.23.14)

We ran into a test failure in an EPEL build for CentOS Stream 9 ppc64le, which has LLVM 17.0.6. For the purpose of this report, I am using LLVM main as of commit d1f0444.

The test works fine (prints "ok") with the default CPU:

$ clang -lm reduced.bc -o test &amp;&amp; ./test
ok
$ clang -lm reduced.bc -o test -O1 &amp;&amp; ./test
ok

However, with -mcpu=power9, it fails at -O1:

$ clang -lm reduced.bc -o test -mcpu=power9 &amp;&amp; ./test
ok
$ clang -lm reduced.bc -o test -mcpu=power9 -O1 &amp;&amp; ./test
thread 'main' panicked at examples/test_bgra16.rs:16:5:
bad red channel in [0, 255, 0]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

At @nikic's suggestion, I tried the optimized build with -debug-counter=dagcombine-count=0, and that ran fine. Then I used bisect-skip-count to narrow the failure to -debug-counter=dagcombine-skip=1463229,dagcombine-count=201. I hope that's a small enough range that someone who knows dagcombine and/or powerpc better can inspect the problem!

I was also trying to reduce the testcase further with llvm-reduce, but I think it quickly got into UB, because even when I verified with an -O0 build, that got into the weeds where the same build would sometimes print "ok" and sometimes crash.

ecnelises · 2024-02-29T07:06:20Z

llvm-project/llvm/lib/Target/PowerPC/PPCISelLowering.cpp

Lines 14927 to 14953 in 3246c44

    
           SDValue FirstOperand(Op.getOperand(0)); 
        
           bool SubWordLoad = FirstOperand.getOpcode() == ISD::LOAD && 
        
             (FirstOperand.getValueType() == MVT::i8 || 
        
              FirstOperand.getValueType() == MVT::i16); 
        
           if (Subtarget.hasP9Vector() && Subtarget.hasP9Altivec() && SubWordLoad) { 
        
             bool Signed = N->getOpcode() == ISD::SINT_TO_FP; 
        
             bool DstDouble = Op.getValueType() == MVT::f64; 
        
             unsigned ConvOp = Signed ? 
        
               (DstDouble ? PPCISD::FCFID  : PPCISD::FCFIDS) : 
        
               (DstDouble ? PPCISD::FCFIDU : PPCISD::FCFIDUS); 
        
             SDValue WidthConst = 
        
               DAG.getIntPtrConstant(FirstOperand.getValueType() == MVT::i8 ? 1 : 2, 
        
                                     dl, false); 
        
             LoadSDNode *LDN = cast<LoadSDNode>(FirstOperand.getNode()); 
        
             SDValue Ops[] = { LDN->getChain(), LDN->getBasePtr(), WidthConst }; 
        
             SDValue Ld = DAG.getMemIntrinsicNode(PPCISD::LXSIZX, dl, 
        
                                                  DAG.getVTList(MVT::f64, MVT::Other), 
        
                                                  Ops, MVT::i8, LDN->getMemOperand()); 
        
             // For signed conversion, we need to sign-extend the value in the VSR 
        
             if (Signed) { 
        
               SDValue ExtOps[] = { Ld, WidthConst }; 
        
               SDValue Ext = DAG.getNode(PPCISD::VEXTS, dl, MVT::f64, ExtOps); 
        
               return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ext); 
        
             } else 
        
               return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ld); 
        
           }

lxsibzx is suspicious. Deleting block above makes this case pass.

cuviper · 2024-02-29T20:56:57Z

I can confirm that deleting that block lets the original Rust crate pass its tests as well.

ecnelises · 2024-03-12T10:21:03Z

Created #84892 to fix this.

The reason of the bug is: when combining load with int-to-fp, we missed replacing uses of previous load's chain with new one. Thus following memory operations may become unordered.

// Compile with -mcpu=power9
void foo(unsigned char *a, long b) {
  double x = (double)a[0] - 1.28e2;
  double y = (double)a[8] - 1.28e2;
  *((double*)a) = y;
  *((double*)(a+8)) = x;
}

For above C code, compiler (wrongly) schedules first store before second load:

lxsibzx 0, 0, 3
mr	4, 3
stfdu 0, 8(4)
lxsibzx 0, 0, 4

nikic · 2024-03-18T14:54:11Z

/cherry-pick e5b20c8

llvmbot · 2024-03-18T14:59:34Z

/pull-request #85648

Update to LLVM 18.1.2 Fixes rust-lang#122476. Also contains fixes for Rahix/avr-hal#505 and llvm/llvm-project#83362. r? `@cuviper`

github-actions bot added the new issue label Feb 29, 2024

EugeneZelenko added backend:PowerPC and removed new issue labels Feb 29, 2024

ecnelises self-assigned this Feb 29, 2024

ecnelises mentioned this issue Mar 12, 2024

[PowerPC] Update chain uses when emitting lxsizx #84892

Merged

nikic added this to the LLVM 18.X Release milestone Mar 15, 2024

ecnelises closed this as completed in #84892 Mar 18, 2024

EugeneZelenko added the release:backport label Mar 18, 2024

This was referenced Mar 20, 2024

Update to LLVM 18.1.2 rust-lang/llvm-project#172

Merged

Update to LLVM 18.1.2 rust-lang/rust#122772

Merged

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 21, 2024

Auto merge of rust-lang#122772 - nikic:update-llvm-22, r=cuviper

6ec953c

Update to LLVM 18.1.2 Fixes rust-lang#122476. Also contains fixes for Rahix/avr-hal#505 and llvm/llvm-project#83362. r? `@cuviper`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PowerPC: Rust test fails when optimized for power9 #83362

PowerPC: Rust test fails when optimized for power9 #83362

cuviper commented Feb 29, 2024

llvmbot commented Feb 29, 2024

ecnelises commented Feb 29, 2024

cuviper commented Feb 29, 2024

ecnelises commented Mar 12, 2024

nikic commented Mar 18, 2024

llvmbot commented Mar 18, 2024

PowerPC: Rust test fails when optimized for power9 #83362

PowerPC: Rust test fails when optimized for power9 #83362

Comments

cuviper commented Feb 29, 2024

llvmbot commented Feb 29, 2024

ecnelises commented Feb 29, 2024

cuviper commented Feb 29, 2024

ecnelises commented Mar 12, 2024

nikic commented Mar 18, 2024

llvmbot commented Mar 18, 2024