-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[benchmark] Driver Improvements #26303
Conversation
Support for gathering a minimal number of samples per benchmark, using the optional `--min-samples` argument, which overrides the automatically computed number of samples per `sample-time` if this is lower�.
Added support for running benchmarks using substring filters. Positional arguments prefixed with a single + or - sign are interpreted as benchmark name filters. Excecutes all benchmarks whose names include any of the strings prefixed with a plus sign but none of the strings prefixed with a minus sign.
Added --meta option to log measurement metadata: * PAGES – number of memory pages used * ICS – number of involuntary context switches * YIELD – number of voluntary yields (Pages and ICS were previously available only in --verbose mode.)
@swift-ci please benchmark |
@swift-ci please test |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -OnoneCode size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please benchmark |
I’m re-running the benchmarks to see if the reported false changes are stable across the runs or random (I hope). I suspect the instability is caused by the small sample size, since the |
Performance: -O
Code size: -O
Performance: -OsizeCode size: -Osize
Performance: -OnoneCode size: -swiftlibsHow to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
Hrm… stable change. @eeckstein, do you have some theory that could explain this? |
Some benchmarks are just unstable. We should consider disabling them |
I have #20552 open that would address the |
Looking at the stability of benchmark results during measurement, I would not call @lorentey, maybe you have some idea to explain the 23–26% jumps in |
This PR adds 3 features to the benchmark driver:
--min-samples
attribute to specify minimal number of samples to gather.--meta
option to log measurement metadata in the benchmark summary (along with corresponding support for parsing such log format):Example:
Motivation
The support for running benchmarks using substring filter is meant as an alternative to running benchmarks using
tags
(andskip-tags
). Maybe if we consistently applied the benchmark naming convention to the whole Swift Benchmark Suite, the tags could be retired? Let's see how we'll like that in practice…I'd like to further experiment with the measurement process to find a better balance between runtime and robustness of measurement by doing parallel benchmarking. The
--meta
option surfaces the key data about the quality of measurement environment, without the need to resort to the much more expensive--verbose
mode. The--min-samples
controls the minimal sample size for a proper statistical analysis, so that we could rely more on--sample-time
(when measuring with--num-samples=1
), without needing to set it too high in order to get enough data from the worst cases (slow benchmarks).Best reviewed by individual commits.