Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PowerPC: Rust test fails when optimized for power9 #83362

Closed
cuviper opened this issue Feb 29, 2024 · 6 comments · Fixed by #84892
Closed

PowerPC: Rust test fails when optimized for power9 #83362

cuviper opened this issue Feb 29, 2024 · 6 comments · Fixed by #84892

Comments

@cuviper
Copy link
Member

cuviper commented Feb 29, 2024

Here is my test case extracted from a Rust LTO build: reduced.bc.gz
(originally from the rust image crate at version 0.23.14)

We ran into a test failure in an EPEL build for CentOS Stream 9 ppc64le, which has LLVM 17.0.6. For the purpose of this report, I am using LLVM main as of commit d1f0444.

The test works fine (prints "ok") with the default CPU:

$ clang -lm reduced.bc -o test && ./test
ok
$ clang -lm reduced.bc -o test -O1 && ./test
ok

However, with -mcpu=power9, it fails at -O1:

$ clang -lm reduced.bc -o test -mcpu=power9 && ./test
ok
$ clang -lm reduced.bc -o test -mcpu=power9 -O1 && ./test
thread 'main' panicked at examples/test_bgra16.rs:16:5:
bad red channel in [0, 255, 0]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

At @nikic's suggestion, I tried the optimized build with -debug-counter=dagcombine-count=0, and that ran fine. Then I used bisect-skip-count to narrow the failure to -debug-counter=dagcombine-skip=1463229,dagcombine-count=201. I hope that's a small enough range that someone who knows dagcombine and/or powerpc better can inspect the problem!

I was also trying to reduce the testcase further with llvm-reduce, but I think it quickly got into UB, because even when I verified with an -O0 build, that got into the weeds where the same build would sometimes print "ok" and sometimes crash.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 29, 2024

@llvm/issue-subscribers-backend-powerpc

Author: Josh Stone (cuviper)

Here is my test case extracted from a Rust LTO build: [reduced.bc.gz](https://github.com/llvm/llvm-project/files/14441197/reduced.bc.gz) (originally from the rust `image` crate at version 0.23.14)

We ran into a test failure in an EPEL build for CentOS Stream 9 ppc64le, which has LLVM 17.0.6. For the purpose of this report, I am using LLVM main as of commit d1f0444.

The test works fine (prints "ok") with the default CPU:

$ clang -lm reduced.bc -o test && ./test
ok
$ clang -lm reduced.bc -o test -O1 && ./test
ok

However, with -mcpu=power9, it fails at -O1:

$ clang -lm reduced.bc -o test -mcpu=power9 && ./test
ok
$ clang -lm reduced.bc -o test -mcpu=power9 -O1 && ./test
thread 'main' panicked at examples/test_bgra16.rs:16:5:
bad red channel in [0, 255, 0]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

At @nikic's suggestion, I tried the optimized build with -debug-counter=dagcombine-count=0, and that ran fine. Then I used bisect-skip-count to narrow the failure to -debug-counter=dagcombine-skip=1463229,dagcombine-count=201. I hope that's a small enough range that someone who knows dagcombine and/or powerpc better can inspect the problem!

I was also trying to reduce the testcase further with llvm-reduce, but I think it quickly got into UB, because even when I verified with an -O0 build, that got into the weeds where the same build would sometimes print "ok" and sometimes crash.

@ecnelises ecnelises self-assigned this Feb 29, 2024
@ecnelises
Copy link
Member

SDValue FirstOperand(Op.getOperand(0));
bool SubWordLoad = FirstOperand.getOpcode() == ISD::LOAD &&
(FirstOperand.getValueType() == MVT::i8 ||
FirstOperand.getValueType() == MVT::i16);
if (Subtarget.hasP9Vector() && Subtarget.hasP9Altivec() && SubWordLoad) {
bool Signed = N->getOpcode() == ISD::SINT_TO_FP;
bool DstDouble = Op.getValueType() == MVT::f64;
unsigned ConvOp = Signed ?
(DstDouble ? PPCISD::FCFID : PPCISD::FCFIDS) :
(DstDouble ? PPCISD::FCFIDU : PPCISD::FCFIDUS);
SDValue WidthConst =
DAG.getIntPtrConstant(FirstOperand.getValueType() == MVT::i8 ? 1 : 2,
dl, false);
LoadSDNode *LDN = cast<LoadSDNode>(FirstOperand.getNode());
SDValue Ops[] = { LDN->getChain(), LDN->getBasePtr(), WidthConst };
SDValue Ld = DAG.getMemIntrinsicNode(PPCISD::LXSIZX, dl,
DAG.getVTList(MVT::f64, MVT::Other),
Ops, MVT::i8, LDN->getMemOperand());
// For signed conversion, we need to sign-extend the value in the VSR
if (Signed) {
SDValue ExtOps[] = { Ld, WidthConst };
SDValue Ext = DAG.getNode(PPCISD::VEXTS, dl, MVT::f64, ExtOps);
return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ext);
} else
return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ld);
}

lxsibzx is suspicious. Deleting block above makes this case pass.

@cuviper
Copy link
Member Author

cuviper commented Feb 29, 2024

I can confirm that deleting that block lets the original Rust crate pass its tests as well.

@ecnelises
Copy link
Member

Created #84892 to fix this.

The reason of the bug is: when combining load with int-to-fp, we missed replacing uses of previous load's chain with new one. Thus following memory operations may become unordered.

// Compile with -mcpu=power9
void foo(unsigned char *a, long b) {
  double x = (double)a[0] - 1.28e2;
  double y = (double)a[8] - 1.28e2;
  *((double*)a) = y;
  *((double*)(a+8)) = x;
}

For above C code, compiler (wrongly) schedules first store before second load:

lxsibzx 0, 0, 3
mr	4, 3
stfdu 0, 8(4)
lxsibzx 0, 0, 4

@nikic nikic added this to the LLVM 18.X Release milestone Mar 15, 2024
@nikic
Copy link
Contributor

nikic commented Mar 18, 2024

/cherry-pick e5b20c8

@llvmbot
Copy link
Collaborator

llvmbot commented Mar 18, 2024

/pull-request #85648

bors added a commit to rust-lang-ci/rust that referenced this issue Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging a pull request may close this issue.

5 participants