Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f32::powi(_,0) broken on raspberry pi (beta and nightly) #37559

Closed
kali opened this issue Nov 3, 2016 · 37 comments
Closed

f32::powi(_,0) broken on raspberry pi (beta and nightly) #37559

kali opened this issue Nov 3, 2016 · 37 comments
Labels
O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@kali
Copy link
Contributor

kali commented Nov 3, 2016

On a rapsberry pi, rust installed through with rustup:

pi@raspberrypi:~ $ echo 'fn main() { println!("{}", 12345.0f32.powi(0)); }' | rustc +stable - ; ./rust_out
1
pi@raspberrypi:~ $ echo 'fn main() { println!("{}", 12345.0f32.powi(0)); }' | rustc +beta - ; ./rust_out
12345

Stable is correct, but beta and nightly are terribly wrong, so with have a regression. It specifically breaks num-bigint tests on raspberry pi.

@kali kali closed this as completed Nov 3, 2016
@kali kali reopened this Nov 3, 2016
@TimNN TimNN added I-wrong O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Nov 3, 2016
@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

I tried to find the nightlies between which this was introduced, however the best I can do is some time between nightly-2016-08-03 and nightly-2016-10-03, since arm nightlies have been broken in between (the SIGILL + cross compile symbols bugs).

I can get nightly-2016-08-18 to work correctly by specifying -Z orbit=off, although I don't think this is a mir issue, since nightly-2016-08-03 works with -Z orbit=on.

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

In case it helps anyone, assembly and ir of the following demo are here: https://gist.github.com/anonymous/a2a13985180e9960579f5f4a5372596a

#[inline(never)]
fn foo() -> f32 {
    12345.0f32.powi(0)
}

fn main() {
    println!("{}", foo());
}

@kali
Copy link
Contributor Author

kali commented Nov 3, 2016

Forgot to mention that as far as I can tell, it only impacts the debug builds. -O builds look fine.

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

@kali: At least for my example, when compiling with -O (asm / ir) the expression get's const folded by llvm.

Edit: Indeed, when preventing llvm from const folding the powi like below, the error occurs as well with -O.

#[inline(never)]
fn foo(i: i32) -> f32 {
    12345.0f32.powi(i)
}

fn main() {
    let s = String::new();

    println!("{}", foo(s.len() as i32));
}

@kali
Copy link
Contributor Author

kali commented Nov 3, 2016

For what it's worth, I can not reproduce it with armv7-apple-ios on current nightly.

@nikomatsakis
Copy link
Contributor

cc @japaric -- maybe related to the builtins stuff?

@brson
Copy link
Contributor

brson commented Nov 3, 2016

@japaric if you know the fix, can it be backported to beta in the next day or two?

@brson
Copy link
Contributor

brson commented Nov 3, 2016

Oh, it's related to optimization, so maybe not intrinsics.

@nagisa
Copy link
Member

nagisa commented Nov 3, 2016

Investigating. Can’t run rustc on my rπ due to glibc version dependency woes :/

@japaric
Copy link
Member

japaric commented Nov 3, 2016

Isn't this a Raspberry Pi? Why is everyone using the armv7 triple? That Pi has an ARMv6 processor in it. (Pi 2+ is ARMv7).

@kali Can you post the output of running rustc -Vv on the Pi? I want to see if you are using the soft float or the hard float target. If it's the hard float target then that powi snippet doesn't lower to an intrinsic.

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

@japaric: At least I have host: armv7-unknown-linux-gnueabihf on the pi (which I think is a V2).

@kali
Copy link
Contributor Author

kali commented Nov 3, 2016

It's a Pi 2.

Envoyé de mon iPhone

Le 3 nov. 2016 à 19:44, Jorge Aparicio notifications@github.com a écrit :

Isn't this a Raspberry Pi? Why is everyone using the armv7 triple? That Pi has an ARMv6 processor in it. (Pi 2+ is ARMv7).

@kali Can you post the output of running rustc -Vv on the Pi? I want to see if you are using the soft float or the hard float target. If it's the hard float target then that powi snippet doesn't lower to an intrinsic.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@kali
Copy link
Contributor Author

kali commented Nov 3, 2016

I don't have access to it right now to check the abi. It's a fresh raspbian install, and rustup.

Envoyé de mon iPhone

Le 3 nov. 2016 à 19:53, Mathieu Poumeyrol kalizoy@fastmail.fm a écrit :

It's a Pi 2.

Envoyé de mon iPhone

Le 3 nov. 2016 à 19:44, Jorge Aparicio notifications@github.com a écrit :

Isn't this a Raspberry Pi? Why is everyone using the armv7 triple? That Pi has an ARMv6 processor in it. (Pi 2+ is ARMv7).

@kali Can you post the output of running rustc -Vv on the Pi? I want to see if you are using the soft float or the hard float target. If it's the hard float target then that powi snippet doesn't lower to an intrinsic.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

Alright, I can reproduce this now via cross-compile + qemu, even inside the range that previously failed due to SIGILL, I'll try to track down the nightlies between which this was introduced.

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

This was introduced between nightly-2016-09-13 and nightly-2016-09-15 (Changes).

This includes #35021, the crate-ifycation of compiler-rt.

@nikomatsakis
Copy link
Contributor

triage: P-high

Given that ARM/RasbPi are not tier 1, it's not entirely clear how to prioritize this, but it seems like something we should fix sooner rather than later. We haven't assigned anyone from compiler team yet though since it's not clear who would be best to tackle it. @japaric, do you think you can investigate whether it is indeed related to #35021?

@rust-highfive rust-highfive added P-high High priority and removed I-nominated labels Nov 3, 2016
@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

Alright, so the generated llvm-ir and asm are exactly the same (at least when using the --emit flag). Gist, including modified test case: https://gist.github.com/TimNN/a9cc563fa8dffd40a5192b510509af1b

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

I have disassembled the generated executables and actually found a difference now:

In the "good" case, the calculation is deferred to the external __powisf2 symbol.

In the "bad" case, __powisf2 is implemented in the executable as follows:

             __powisf2:
0002af00         vmov.f32   s14, #0x1                                           
0002af04         mov        r3, r1
0002af06         vmov       s15, r0
0002af0a         b          0x2af10

0002af0c         vmul.f32   s15, s15, s15                                       

0002af10         add.w      r2, r3, r3, lsr #31                                 
0002af14         lsls       r3, r3, #0x1f
0002af16         it         mi
0002af18         vmulmi.f32 s14, s14, s15
0002af1c         asrs       r3, r2, #0x1
0002af1e         bne        0x2af0c

0002af20         cmp        r1, #0x0
0002af22         itt        lt
0002af24         vmovlt.f32 s15, #0x1
0002af28         vdivlt.f32 s14, s15, s14
0002af2c         vmov       r0, s14
0002af30         bx         lr

Both executables require the same shared libraries:

 0x00000001 (NEEDED)                     Shared library: [libdl.so.2]
 0x00000001 (NEEDED)                     Shared library: [librt.so.1]
 0x00000001 (NEEDED)                     Shared library: [libpthread.so.0]
 0x00000001 (NEEDED)                     Shared library: [libgcc_s.so.1]
 0x00000001 (NEEDED)                     Shared library: [libc.so.6]
 0x00000001 (NEEDED)                     Shared library: [ld-linux-armhf.so.3]
 0x00000001 (NEEDED)                     Shared library: [libm.so.6]

My current hypothesis (from what I picked up over time, I'm not really an expert here) is that the compiler-rt powisf2 intrinsic is simply faulty and that before the crate-ifycation, we weren't actually using compiler-rt but rather linking agains gcc_s (I think I remember something along those lines).

@TimNN
Copy link
Contributor

TimNN commented Nov 3, 2016

(The following is 70% guesswork)

I've been trying to debug this a bit and think that problem is a calling convention mismatch. The __powisf2 function expects arguments in r0 and r1 (Edit1) and returns it's result in r0. The caller however tries to pass the arguments via the stack, as far as I can tell (see below excerpt from a gdb session).

   │0x54aad71c <core::f32::{{impl}}::powi+4>        mov    r11, sp                                                                                                                  │
   │0x54aad720 <core::f32::{{impl}}::powi+8>        sub    sp, sp, #24                                                                                                              │
   │0x54aad724 <core::f32::{{impl}}::powi+12>       vstr   s0, [r11, #-4]                                                                                                           │
   │0x54aad728 <core::f32::{{impl}}::powi+16>       str    r0, [r11, #-8]                                                                                                           │
   │0x54aad72c <core::f32::{{impl}}::powi+20>       vldr   s0, [r11, #-4]                                                                                                           │
   │0x54aad730 <core::f32::{{impl}}::powi+24>       vstr   s0, [sp, #12]                                                                                                            │
   │0x54aad734 <core::f32::{{impl}}::powi+28>       ldr    r0, [r11, #-8]                                                                                                           │
   │0x54aad738 <core::f32::{{impl}}::powi+32>       str    r0, [sp, #8]                                                                                                             │
   │0x54aad73c <core::f32::{{impl}}::powi+36>       vldr   s0, [sp, #12]                                                                                                            │
   │0x54aad740 <core::f32::{{impl}}::powi+40>       ldr    r0, [sp, #8]                                                                                                             │
  >│0x54aad744 <core::f32::{{impl}}::powi+44>       blx    0x54ad5480 <__powisf2>                                                                                                   │
   │0x54aad748 <core::f32::{{impl}}::powi+48>       vstr   s0, [sp, #4]                                                                                                             │
   │0x54aad74c <core::f32::{{impl}}::powi+52>       vldr   s0, [sp, #4]                                                                                                             │
   │0x54aad750 <core::f32::{{impl}}::powi+56>       vstr   s0, [sp]                                                                                                                 │
   │0x54aad754 <core::f32::{{impl}}::powi+60>       vldr   s0, [sp]                                                                                                                 │
   │0x54aad758 <core::f32::{{impl}}::powi+64>       mov    sp, r11

1 Edit: was previously r1, r2, but that seemed incorrect.

@japaric
Copy link
Member

japaric commented Nov 4, 2016

My current hypothesis ...

That seems likely. It wouldn't be the first time either. We had a compiler-rt implementation that triggered UB and we had to fix its C implementation. And you are right that before the "compiler-rt crate-ification" PR this symbol came from gcc_s rather than from compiler-rt.

Ideas to fix this:

  • Simplest: Stop compiling the compiler-rt powisf2 intrinsic. The corresponding symbol won't be available in the libcompiler-builtins rlib. This would make std programs pick up the symbol from gcc_s instead, just like they did before.
  • Really fix this. Audit and fix the compiler-rt implementation, assuming it has some bug; or perhaps, the fix is to tweak the gcc flags we are using to compile compiler-rt if this comment "think that problem is a calling convention mismatch" is true.
  • Long term: Move to the Rust port of compiler-rt. Hopefully, that would get rid of any UB. And it also eliminates the possibility of compiling C code and Rust code with different codegen options / calling conventions.

@alexcrichton
Copy link
Member

Holy cow, nice investigation @TimNN! That's only mildly disturbing that making our linking to compiler-rt "more correct" ended up introducing a bug...

Audit and fix the compiler-rt implementation

The implementation is pretty small and seems like for the arguments in the program above it should trivially return the right value, unfortunately.

Compiling with the wrong flags though seems quite possible! If that is indeed the problem then in theory this can wreak havoc with jemalloc and/or any other C code in the world as well, so I'd prefer to dig into if that's the problem before we stop compiling powisf2. Presumably many other float intrinsics are also incorrect? (if we're miscompiling)

@kali
Copy link
Contributor Author

kali commented Nov 4, 2016

Wow. That was impressive. Thanks a lot @TimNN and the gang.

brson pushed a commit to brson/rust that referenced this issue Nov 4, 2016
alexcrichton added a commit that referenced this issue Nov 4, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this issue Nov 4, 2016
sophiajt pushed a commit to sophiajt/rust that referenced this issue Nov 5, 2016
alexcrichton added a commit to alexcrichton/rust that referenced this issue Nov 5, 2016
@brson
Copy link
Contributor

brson commented Nov 7, 2016

This should be fixed on beta and in 1.13.

@brson
Copy link
Contributor

brson commented Nov 7, 2016

Can anybody test on the latest beta?

@TimNN
Copy link
Contributor

TimNN commented Nov 7, 2016

Seems to be working in rustc 1.13.0-beta.3 (106d18793 2016-11-04) (tested on RPi 2).

@brson
Copy link
Contributor

brson commented Nov 7, 2016

@TimNN Woo! Thank you for testing!

@kali
Copy link
Contributor Author

kali commented Nov 8, 2016

@TimNN. FYI, we may have another regression. num bigint's unit test are still failing with 106d187, still somewhere around f32. I'm having a look to see if/how it relates to #37630 .

@kali
Copy link
Contributor Author

kali commented Nov 8, 2016

ho, boy.

$ echo 'fn main() {println!("{}", 1.0f32); }' | rustc +stable - ; ./rust_out
1
$ echo 'fn main() {println!("{}", 1.0f32); }' | rustc +beta - ; ./rust_out
0.000000000000000000000000000000000000000000001
$ rustc +beta --version
rustc 1.13.0-beta.3 (106d18793 2016-11-04)

@kali
Copy link
Contributor Author

kali commented Nov 8, 2016

float litterals got impacted somehow...

$ echo 'fn main() {println!("{:032b}", unsafe { std::mem::transmute::<f32, u32>(1.0f32)} ); }' | rustc +beta - ; ./rust_out
00000000000000000000000000000001
$ echo 'fn main() {println!("{:032b}", unsafe { std::mem::transmute::<f32, u32>(1.0f32)} ); }' | rustc +stable - ; ./rust_out
00111111100000000000000000000000

@kali
Copy link
Contributor Author

kali commented Nov 8, 2016

Should we re-open this ? or is it better to create a new one ?

@japaric
Copy link
Member

japaric commented Nov 8, 2016

@kali That's the same as #37630. The problem is the parsing of floats.

@kali
Copy link
Contributor Author

kali commented Nov 8, 2016

ok.

alexcrichton added a commit to alexcrichton/rust that referenced this issue Nov 10, 2016
This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep rust-lang#37559 fixed but also address
the new issue of rust-lang#37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes rust-lang#37630

[1]: rust-lang/compiler-rt#26 (comment)
bors added a commit that referenced this issue Nov 11, 2016
std: Update compiler-rt for more ABI fixes

This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep #37559 fixed but also address
the new issue of #37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes #37630

[1]: rust-lang/compiler-rt#26 (comment)
japaric pushed a commit to japaric/compiler-builtins that referenced this issue Nov 16, 2016
bors added a commit to rust-lang/compiler-builtins that referenced this issue Nov 16, 2016
@TimNN TimNN mentioned this issue Nov 16, 2016
23 tasks
alexcrichton added a commit to alexcrichton/rust that referenced this issue Nov 16, 2016
This update of compiler-rt includes rust-lang/compiler-rt#26 which provides a
targeted fix to the powisf2 intrinsics to keep rust-lang#37559 fixed but also address
the new issue of rust-lang#37630. I've also [written up my thoughts][1] on why it appears
that this is the correct fix for now (hoepfully at least).

Closes rust-lang#37630

[1]: rust-lang/compiler-rt#26 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-Arm Target: 32-bit Arm processors (armv6, armv7, thumb...), including 64-bit Arm in AArch32 state P-high High priority regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants