-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stagger benchmarks to smooth background noise #595
Comments
See here https://github.com/pv/asv/commits/many-proc Full staggering may not be so sensible, because you need to uninstall/install the project in between. The general issue is also probably not so much background processes, but hardware causes for CPU performance fluctuations. These occur on laptops, but I guess desktop CPUs are have similar behavior vs. thermal throttling etc. |
The hardware issue sounds tough, but also sounds like you've put a lot of time and thought into this, which is reassuring. What am I looking at with many-proc? Is the idea to save+aggregate results across multiple runs? That seems like a reasonable approach (and possibly easier to implement than staggering?). Is the "not so sensible" an indication that I should give up on this? The status-quo of running |
Staggering can be implemented (in the same way as multiple runs + aggregating results), but I expect it will be slower because between each benchmark you need to uninstall/reinstall the project to the environment. |
Can you expand on that a bit? Could the parent process spawn two env-specific processes which in turn spawn processes for each benchmark without switching in between? Even if not, slower and more accurate is a tradeoff I'd be happy with. I implemented asv for dateutil but the PR stalled because ATM the results are just way too noisy. |
@jbrockmendel: you can perhaps try asv master branch now that the multi-process benchmarking is in there. E.g. |
I'll give it a try. How do I enable the feature where it stores results to calculate statistics over multiple runs? If we can get that working then stability becomes a problem we can throw hardware-hours at. |
It's automatically on (by default processes=2), adjustable on command line as above. There's no option to combine results from multiple Ultimately, if you want good benchmarking accuracy, you probably need to at least disable CPU frequency tuning for one CPU and then use None of this is particularly necessary if you just accumulate historical data, as done here and by most projects using asv: https://pv.github.io/numpy-bench/ |
Is this the expected behavior? I thought it would go the other way. |
Darn. Would a PR implementing this be accepted? (or feasible?)
I have taken to using taskset, am not familiar with the tuning bit. It sounds like our [pandas, prospectively dateutil] use case is not exactly the intended one, but I'm optimistic/hopeful we can make it work. Generally when a PR is made in a perf-sensitive part of the code the maintainer asks for asv comparison. A single run of Cranking the sample size up to 11 seems like the least labor-intensive way to address this. Am I wrong? |
I'm not sure what you mean --- the default is processes=2, and increasing the number to 4 makes it take longer --- as expected?
It's feasible, just needs some plumbing in the right places. But I'm not sure it it will ultimately solve the problem with the accuracy of the results --- you may be able to somewhat reduce the number of false positives, but not fully... E.g., if there is performance variation on time scale of 10sec (such as with laptop CPU thermal control), then you still need some luck to have all your benchmarks sample the full variation.
Sure, it should be possible to make it work by measuring longer (i.e. adjust This is not specific to asv, except in the sense that asv runs many timing benchmarks, and the chance of false positives is multiplied by the number of benchmarks run. However, I expect the situation is already better with asv 0.3.x, which does the statistics properly, than with asv 0.2.x. |
cf gh-689 |
#689 looks like it could be a big help, thanks.
I'm back to being confused. I expected more processes to mean faster execution; why is that the wrong intuition? |
The processes are run sequentially, not at the same time. If they would be run at the same time, this would change the load on the machine and affect results. |
You can try it out in principle. Also, it's unclear to me if the issues you mention were with asv 0.2.x, or whether they are an issue with the current master branch. |
When running
asv continuous
to compare commits A and B, all the benchmarks from A run followed by all the benchmarks from B, e.g. "A.foo, A.bar, A.baz, B.foo, B.bar, B.baz". Would it be feasible to instead run "A.foo, B.foo, A.bar, B.bar, A.baz, B.baz"?(In fact because each benchmark is run multiple times, Ideally I would like the staggering to be even finer-grained.)
The thought here is that background processes make noise auto-correlated, so the running comparisons back-to-back may give more informative ratios. (Based on amateur speculation)
The text was updated successfully, but these errors were encountered: