Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combining inline and target_feature attributes with recursive functions causing incorrect results. #53117

Closed
jackmott opened this issue Aug 6, 2018 · 11 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@jackmott
Copy link

jackmott commented Aug 6, 2018

add_stuff is a function with AVX2 simd intrinsics, set to inline always.
add_stuff_helper is a function with the target_feature set to enable AVX2 instructions.

If you run this in a release build, you will get an incorrect result, the return values from the recursive function do not add up. I believe this is because there is code being generated around the recursion that is not getting the avx2 target_feature applied.

If you make add_stuff not recursive, this all works fine.

The code below obviously does not make sense to use by itself, it works for instance if you put the target_feature on the add_stuff function directly, but this technique is useful for doing some nice SIMD metaprogramming with traits, and this bug makes that not work.

Is this a bug that can be fixed? Or an innate limitation of the inlining, target_features, and recursion? Is there a workaround?

#[cfg(target_arch = "x87")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;
use std::fmt::Debug;

#[inline(always)]
unsafe fn add_stuff(a: f32, count: i32) -> __m256 {
    let b = _mm256_set1_ps(2.0);
    let a2 = _mm256_set1_ps(a);
    if count < 4 {
        println!("count:{}",count);
        let r = _mm256_add_ps(_mm256_add_ps(a2, b), add_stuff(a, count + 1));
        println!("r:{:?}",r);
        r
    } else {
        _mm256_add_ps(a2, b)
    }
}

#[target_feature(enable = "avx2")]
unsafe fn add_stuff_helper() {
    let r = add_stuff(2.0,1);
    println!("raw avx {:?}",r);
}

fn main() {
    unsafe {
        add_stuff_helper();
    }
}

Environment:

This happens with rustc 1.27 stable through 1.31.0 nightly (at least)
All tested on linux, on a cpu that supports AVX2 instructions. Intel core i7 6700

@jackmott jackmott changed the title Inconsistent results in Debug vs Release builds with recursive generic trait function Combining inline and target_feature attributes with recursive functions causing incorrect results. Aug 6, 2018
@jackmott
Copy link
Author

jackmott commented Aug 6, 2018

@alexcrichton could this be related to llvm bug here: https://bugs.llvm.org/show_bug.cgi?id=37358

@alexcrichton
Copy link
Member

Glancing at the IR, yeah, it looks like that bug unfortunately

@jackmott
Copy link
Author

jackmott commented Aug 6, 2018

any tricks one can use to work around that, until LLVM fixes it?

@alexcrichton
Copy link
Member

If add_stuff is annotated with #[target_feature(enable = "avx2")] it fixes the issue, but beyond that there's not a great workaround for this unfortunately. I only saw this at opt-level=3 so you can try opt-level=2, but that's not necessarily guaranteed to work.

@jackmott
Copy link
Author

jackmott commented Aug 6, 2018

Yeah for my use case I could just copypasta whenever I have a recursive simd function, which isn't often!
How quick does llvm tend to fix issues like this? days, months, years? Any way for me to indicate interset in that thread?

@alexcrichton
Copy link
Member

In my experience so far LLVM bugs tend to fall in one of two buckets: fixed within a week or never fixed. Unfortunately that bug's been open for more than a week :(

In that sense my historical experience is that it'll be awhile before the bug is fixed, but we haven't done much work on our part to try to bug those who may know how to fix it (or know who may be able to fix it, and we could always more aggressively do that!)

@jackmott
Copy link
Author

jackmott commented Aug 6, 2018

I'll see if I can get an account on the bugzilla and chime in.

@jackmott
Copy link
Author

Previous comments, now deleted, were using nightly 1 day too soon! #55073 fixes this issue, thanks @alexcrichton

@jackmott
Copy link
Author

reopening as the workaround was reverted. LLVM fix seems to be in the works so I will test when that lands.

I think the fix in 55073 was causing some bugs anyway so glad to see LLVM fix is near.

@jackmott jackmott reopened this Oct 23, 2018
@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-SIMD Area: SIMD (Single Instruction Multiple Data) labels Jan 27, 2019
@steveklabnik
Copy link
Member

@jackmott did you ever get around to testing this again? :)

@jackmott
Copy link
Author

@steveklabnik yes, this is fixed now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-SIMD Area: SIMD (Single Instruction Multiple Data) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants