Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Enable LLVM loop vectorizer #3929

Closed
wants to merge 3 commits into from

Conversation

simonster
Copy link
Member

While there's been some discussion of adding SIMD types in #2299, I thought it might be fun to see how well the LLVM loop vectorizer can do with Julia code. This PR compiles (but doesn't compile sysimg.jl), runs, and vectorizes things (sample), but there are two very big problems with it.

The first issue is that this PR turns all integer add operations into add nsw in order to get that proof of concept to work. Scalar evolution analysis requires that operations on the loop index produce undefined behavior on overflow, but to change this generally might be too unsafe for a high-level language. Since we should be able to guarantee that next(::Range{Int})/next(::Range1{Int}) doesn't overflow, one option is to add nsw intrinsics and use those.

The second issue is that this PR turns jl_value_t into int8*. Not only does this seem very wrong, it also breaks building sysimg.jl, although Julia seems to run fine with a sysimg.jl built without this change. Unfortunately, I haven't been able to get the loop vectorizer to work with jl_value_t as a structure type. With this change, the IR going into the loop vectorization pass looks like this, whereas without it, the IR looks like this. Notice that, with jl_value_t as i8*, the bitcast is outside of the loop, whereas with jl_value_t as a structure type, it is inside the loop. This seems to bother the loop vectorizer, which tells me:

LV: Found a loop: if
LV: Found an induction variable.
LV: Found a runtime check ptr:  %7 = bitcast %jl_value_t* %6 to double*, !dbg !3370
LV: Found a runtime check ptr:  %7 = bitcast %jl_value_t* %6 to double*, !dbg !3370
LV: We need to compare 1 ptrs.
LV: We can perform a memory runtime check if needed.
LV: Found an unidentified write ptr:  %4 = load %jl_value_t** %3, align 8, !dbg !3369
LV: Adding Underlying value:  %4 = load %jl_value_t** %3, align 8, !dbg !3369
LV: Found an unidentified read ptr:  %4 = load %jl_value_t** %3, align 8, !dbg !3369
LV: Found a possible write-write reorder:  %4 = load %jl_value_t** %3, align 8, !dbg !3369
LV: Can't vectorize due to memory conflicts
LV: Not vectorizing.

If I move the bitcast out of the loop and compile the IR manually with opt, it seems to work, but I'm a little confused about what makes these cases different.

If you have a debug build of LLVM, this can be used to produce debug
output for specific passes using JULIA_LLVM_ARGS="-debug-only=passname"
This information is required by the loop vectorizer, but may benefit
other analysis passes as well
With these changes, Julia can now vectorize a simple loop.
Unfortunately, these changes are unacceptable. I've changed addition to
produce undefined behavior on signed integer overflow, and I've changed
jl_value_t from an LLVM structure type to a pointer to an Int8.
@ViralBShah
Copy link
Member

This is really exciting, and I am anxious to see how well this works out.

@simonster
Copy link
Member Author

I can get this to work without changing jl_value_t to i8* if I comment out the GEP optimizations in LLVM's instruction combining pass, so it seems like the problem is between what the instruction combining pass does with the jl_value_t bitcasts and the loop vectorizer's inability to recognize those bitcasts as no-ops.

@simonster
Copy link
Member Author

Superseded by #5355.

@simonster simonster closed this Jan 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants