-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add inline(never) to bench systems #9824
Conversation
Why? Because then it becomes easier to inspect generated ASM using a tool like `cargo-asm`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes a lot of sense and is well-justified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do we need #[no_mangle]
to make it easier to find the symbols?
|
# Objective It is difficult to inspect the generated assembly of benchmark systems using a tool such as `cargo-asm` ## Solution Mark the related functions as `#[inline(never)]`. This way, you can pass the module name as argument to `cargo-asm` to get the generated assembly for the given function. It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place. ### Measurements Following the recommendations in <https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>, I 1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances 2. Disabled all hyperthreading cores using `echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online` 3. Set the scaling governor to `performance` 4. Manually disabled AMD boost with `echo 0 > /sys/devices/system/cpu/cpufreq/boost` 5. Set the nice level of the criterion benchmark using `cargo bench … & sudo renice -n -5 -p $! ; fg` 6. Not running any other program than the benchmarks (outside of system daemons and the X11 server) With this setup, running multiple times the same benchmarks on `main` gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed. On this branch, there is still some spurious performance change detection, but they are much less frequent. This only accounts for `iter_simple` and `iter_frag` benchmarks of course.
# Objective It is difficult to inspect the generated assembly of benchmark systems using a tool such as `cargo-asm` ## Solution Mark the related functions as `#[inline(never)]`. This way, you can pass the module name as argument to `cargo-asm` to get the generated assembly for the given function. It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place. ### Measurements Following the recommendations in <https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>, I 1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances 2. Disabled all hyperthreading cores using `echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online` 3. Set the scaling governor to `performance` 4. Manually disabled AMD boost with `echo 0 > /sys/devices/system/cpu/cpufreq/boost` 5. Set the nice level of the criterion benchmark using `cargo bench … & sudo renice -n -5 -p $! ; fg` 6. Not running any other program than the benchmarks (outside of system daemons and the X11 server) With this setup, running multiple times the same benchmarks on `main` gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed. On this branch, there is still some spurious performance change detection, but they are much less frequent. This only accounts for `iter_simple` and `iter_frag` benchmarks of course.
# Objective It is difficult to inspect the generated assembly of benchmark systems using a tool such as `cargo-asm` ## Solution Mark the related functions as `#[inline(never)]`. This way, you can pass the module name as argument to `cargo-asm` to get the generated assembly for the given function. It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place. ### Measurements Following the recommendations in <https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>, I 1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances 2. Disabled all hyperthreading cores using `echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online` 3. Set the scaling governor to `performance` 4. Manually disabled AMD boost with `echo 0 > /sys/devices/system/cpu/cpufreq/boost` 5. Set the nice level of the criterion benchmark using `cargo bench … & sudo renice -n -5 -p $! ; fg` 6. Not running any other program than the benchmarks (outside of system daemons and the X11 server) With this setup, running multiple times the same benchmarks on `main` gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed. On this branch, there is still some spurious performance change detection, but they are much less frequent. This only accounts for `iter_simple` and `iter_frag` benchmarks of course.
# Objective It is difficult to inspect the generated assembly of benchmark systems using a tool such as `cargo-asm` ## Solution Mark the related functions as `#[inline(never)]`. This way, you can pass the module name as argument to `cargo-asm` to get the generated assembly for the given function. It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place. ### Measurements Following the recommendations in <https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>, I 1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances 2. Disabled all hyperthreading cores using `echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online` 3. Set the scaling governor to `performance` 4. Manually disabled AMD boost with `echo 0 > /sys/devices/system/cpu/cpufreq/boost` 5. Set the nice level of the criterion benchmark using `cargo bench … & sudo renice -n -5 -p $! ; fg` 6. Not running any other program than the benchmarks (outside of system daemons and the X11 server) With this setup, running multiple times the same benchmarks on `main` gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed. On this branch, there is still some spurious performance change detection, but they are much less frequent. This only accounts for `iter_simple` and `iter_frag` benchmarks of course.
# Objective It is difficult to inspect the generated assembly of benchmark systems using a tool such as `cargo-asm` ## Solution Mark the related functions as `#[inline(never)]`. This way, you can pass the module name as argument to `cargo-asm` to get the generated assembly for the given function. It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place. ### Measurements Following the recommendations in <https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux>, I 1. Put my CPU in "AMD ECO" mode, which surprisingly is the equivalent of disabling turboboost, giving more consistent performances 2. Disabled all hyperthreading cores using `echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online` 3. Set the scaling governor to `performance` 4. Manually disabled AMD boost with `echo 0 > /sys/devices/system/cpu/cpufreq/boost` 5. Set the nice level of the criterion benchmark using `cargo bench … & sudo renice -n -5 -p $! ; fg` 6. Not running any other program than the benchmarks (outside of system daemons and the X11 server) With this setup, running multiple times the same benchmarks on `main` gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed. On this branch, there is still some spurious performance change detection, but they are much less frequent. This only accounts for `iter_simple` and `iter_frag` benchmarks of course.
Objective
It is difficult to inspect the generated assembly of benchmark systems using a tool such as
cargo-asm
Solution
Mark the related functions as
#[inline(never)]
. This way, you can pass the module name as argument tocargo-asm
to get the generated assembly for the given function.It may have as side effect to make benchmarks a bit more predictable and useful too. As it prevents inlining where in bevy no inlining could possibly take place.
Measurements
Following the recommendations in https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux, I
echo 0 > /sys/devices/system/cpu/cpu{11,12…}/online
performance
echo 0 > /sys/devices/system/cpu/cpufreq/boost
cargo bench … & sudo renice -n -5 -p $! ; fg
With this setup, running multiple times the same benchmarks on
main
gives me a lot of "regression" and "improvement" messages, which is absurd given that no code changed.On this branch, there is still some spurious performance change detection, but they are much less frequent.
This only accounts for
iter_simple
anditer_frag
benchmarks of course.