-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large performance loss due to lacking LTO #39
Comments
I think your c compiler in this case would have to be clang, and you may have to tweak the compilation flags a bit. this may be of use: https://doc.rust-lang.org/rustc/linker-plugin-lto.html |
I tried to change the C compiler (as I also reported here), but the code crashes for some reason. I tried to get CPATH=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/ clang f128.c -flto=thin -c -o ./f128clang.o -O2
ar crus libf128.a f128clang.o
CPATH=/usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/ clang test.c libf128.a -flto=thin -O2 -o test_clang this works for my C test script Changing the f128 crate build script to: println!(r"clang src/f128.c -flto=full -lquadmath -c -o ./f128clang.o -O2");
println!(r"ar crus libf128.a f128clang.o"); and running the small project [package]
name = "f128perf"
version = "0.1.0"
edition = "2021"
[profile.dev]
opt-level = 2
lto = "fat"
[profile.release]
lto = "fat"
[dependencies]
f128 = {path="../f128"}
num-traits = "*" and use num_traits::cast::FromPrimitive;
fn main() {
let mut a = f128::f128::from_f64(2.).unwrap();
let b = f128::f128::from_f64(3.).unwrap();
for _ in 0..10000000 {
a = a + b - b;
a *= b;
a = a / b;
}
println!("a={}", a);
} with
it does compile but it gives a runtime crash ( #0 0x0000555555595c49 in f64_to_f128 ()
#1 0x00005555555646b9 in f128::f128_t::{impl#7}::from_f64 (n=2) at /home/ben/Sync/Research/f128/src/f128_t.rs:420
#2 f128perf::main () at /home/ben/Sync/Research/f128perf/src/main.rs:4 with rustc 1.61.0 and clang version 14.0.6 on x86_64-pc-linux-gnu. This crash doens't go away when I tried Do you have any idea what may cause this crash? |
Tried taking a stab at this tonight and I ended up getting a different segfault. I'll dig into it this Saturday. |
Looking at this again - I suspect you may have to compile the crate yourself, modifying f128_internal's Cargo.toml with the appropriate LTO flags. Have you tried this already? |
@benruijl I recently compiled f128 crate without any changes and dug into the disassembly. The generated assembly for
Where the call to
This makes me think that the rust compiler is doing a competent job at LTO. I think the issue is the way some things are called, and the way data is passed to FFI functions. Looking more into the generated assembler though, I think that both the C and rust compilers are hesitant to use 128 bit registers to pass function values just because they're 128 bits in size:
Something like this occurs for every function call for each argument, when if our interfaces were completely transparent it would simply call the function without tweaking the registers (I think?). However, rustc and any competent c compiler will use the SSE registers to pass 128 bit primitives -- i128 / __int128 in gcc. This leads me to believe that we can simply replace the wrapper type entirely. This solution is very biased towards x86 ISAs however. I'm not familiar enough with other popular ISAs (e.g. apple M1) to really say if this would work, and it would likely depend on compilers doing the same thing. I guess the conclusion is: LTO works, the compilers are doing exactly what we tell them to do, it just so happens that we're telling them to do something sort of silly. If you still have a use for this library, I will attempt to make this happen. |
The low-level operations on f128 numbers in C,
__addtf3
etc, are wrapped around using theWrapper
type inf128.c
. This causes overhead of about a factor 1.5 to a factor 2, as can be seen from this flamegraph: .In
C/C++
, this substantial loss can be mitigated by compiling theC
library withlto
:gcc -O3 -flto -lgfortran -lquadmath -Bstatic -c f128.c gcc-ar crf libf128.a f128.o g++ -O3 test.c libf128.a -flto -lquadmath -o test
where
test.c
is a benchmark script:I am trying to achieve a similar performance boost in Rust, but I am struggling and wondering if it's even possible since Rust compiles with LLVM and we need to use g++ instead of clang for the quadmath extension.
I tried adding
.flag("-flto")
to the build script, but that causes linking errors (presumably because the LLVM linker cannot read g++ LTO info). Adding.flag("-ffat-lto-objects")
does restore compilation but only because LLVM can now opt to not use LTO.Does anyone here know a solution?
The text was updated successfully, but these errors were encountered: