Use stddev from benchmark to do a statistical test #52

fonsp · 2023-06-08T13:25:57Z

We currently get some random test failures, because the benchmarks are hardcoded to be "within 1.2 times the time from Distributed". I found that:

@belapsed is the minimum time, not mean time. That's still an interesting measure (which gets more accurate with more samples) but I believe it has a higher spread? In any case, my intention is to test mean time.
seconds=1 makes our tests fast but not worth the high random failure rate
By doing actual stats, we can control the "admissible false positive" rate, which I set to 2.5% right now. It should be pretty low because we have so many CI runners.

Right now the PR changes our goal to be "97% sure that we are not slower".
This should probably be: "97% sure that we are not more than 20% slower". EDIT no its fine

fonsp · 2023-06-08T13:57:34Z

Hm, in the BenchmarkTools docs there is a section about comparing results, but they just use a fixed tolerance of 1.05 (customizable), instead of taking the stddev into account...

https://juliaci.github.io/BenchmarkTools.jl/stable/manual/#Handling-benchmark-results

They also recommend median instead of mean to be less sensitive to outliers. That makes sense but I forgot how to do statistics with medians instead of means...

fonsp · 2023-06-08T14:35:26Z

leaving the windows failure for another day :)

Use stddev from benchmark to

e4ac803

fonsp changed the title ~~Use stddev from benchmark to~~ Use stddev from benchmark to do a statistical test Jun 8, 2023

fonsp added 2 commits June 8, 2023 15:38

asdf

1cd0f74

tune the benchmark first

8935e80

fonsp added 2 commits June 8, 2023 16:15

tweak shutdown

77ae7b2

log tweaks

9473ac6

fonsp merged commit 0d88ca0 into main Jun 8, 2023

fonsp deleted the stats branch June 8, 2023 14:35

fonsp mentioned this pull request Jun 8, 2023

benchmarks without stats because the samples are not normally distributed #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use stddev from benchmark to do a statistical test #52

Use stddev from benchmark to do a statistical test #52

fonsp commented Jun 8, 2023 •

edited

Loading

fonsp commented Jun 8, 2023

fonsp commented Jun 8, 2023

Use stddev from benchmark to do a statistical test #52

Use stddev from benchmark to do a statistical test #52

Conversation

fonsp commented Jun 8, 2023 • edited Loading

fonsp commented Jun 8, 2023

fonsp commented Jun 8, 2023

fonsp commented Jun 8, 2023 •

edited

Loading