Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking tests #274

Closed
jagerber48 opened this issue Dec 16, 2024 · 0 comments · Fixed by #275
Closed

Benchmarking tests #274

jagerber48 opened this issue Dec 16, 2024 · 0 comments · Fixed by #275

Comments

@jagerber48
Copy link
Contributor

The issue concerns adding benchmarking for the performance optimization that was introduced in release 3.0.1. See especially #30 for a thorough discussion. The short summary is that using a naive linear error propagation algorithm will run the following snippet

sum(ufloat(1, 0.1) for _ in range(n)).std_dev

in O(N^2) time whereas the lazy weight evaluation algorithm innovated for 3.0.1 runs it in O(N) time. I think this is one of the main technical innovations from this package that separates it from anything anyone would try to "whip up".

PR #262 refactors the code that implements this lazy weight evaluation algorithm. For this reason I think it is necessary that we set up benchmarking tests that we confirm we pass before and after merging #262. In fact, I've re-implemented the error propagation multiple times as I've drafted #262 and repeatedly have failed to realized the O(N) execution time. This demonstrates the importance of a benchmarking tests to help us ensure we don't accidently introduce a change that ruins the performance (as I already did accidentally during my drafts with just two lines in the wrong place).


I have the following simple benchmarking code

import platform
import psutil
import timeit

from uncertainties import ufloat


def get_system_info():
    return {
        'platform': platform.system(),
        'platform-release': platform.release(),
        'platform-version': platform.version(),
        'architecture': platform.machine(),
        'processor': platform.processor(),
        'ram': str(round(psutil.virtual_memory().total / (1024.0 ** 3))) + " GB",
    }



def ufloat_sum_benchmark(num):
    str(sum(ufloat(1, 1) for _ in range(num)))


if __name__ == "__main__":
    for key, value in get_system_info().items():
        print(f'{key:17}: {value}')

    for n in (10, 100, 1000, 10000, 100000):
        print(f'### {n=} ###')
        reps = int(100000/n)
        t = timeit.timeit(lambda: ufloat_sum_benchmark(n), number=reps)
        print(f'    Test duration: {t:.2f} s, Repetitions: {reps}')
        print(f'    Average execution time: {t/reps:.4f} s')

On my system on master branch I get

platform         : Windows
platform-release : 10
platform-version : 10.0.19045
architecture     : AMD64
processor        : Intel64 Family 6 Model 154 Stepping 4, GenuineIntel
ram              : 16 GB
### n=10 ###
    Test duration: 0.90 s, Repetitions: 10000
    Average execution time: 0.0001 s
### n=100 ###
    Test duration: 0.79 s, Repetitions: 1000
    Average execution time: 0.0008 s
### n=1000 ###
    Test duration: 0.95 s, Repetitions: 100
    Average execution time: 0.0095 s
### n=10000 ###
    Test duration: 0.99 s, Repetitions: 10
    Average execution time: 0.0995 s
### n=100000 ###
    Test duration: 0.71 s, Repetitions: 1
    Average execution time: 0.7065 s

On feature/linear_combo_refactor I get

platform         : Windows
platform-release : 10
platform-version : 10.0.19045
architecture     : AMD64
processor        : Intel64 Family 6 Model 154 Stepping 4, GenuineIntel
ram              : 16 GB
### n=10 ###
    Test duration: 2.40 s, Repetitions: 10000
    Average execution time: 0.0002 s
### n=100 ###
    Test duration: 2.12 s, Repetitions: 1000
    Average execution time: 0.0021 s
### n=1000 ###
    Test duration: 1.98 s, Repetitions: 100
    Average execution time: 0.0198 s
### n=10000 ###
    Test duration: 1.86 s, Repetitions: 10
    Average execution time: 0.1860 s
### n=100000 ###
    Test duration: 1.77 s, Repetitions: 1
    Average execution time: 1.7692 s

So we see that the new code is still linear in time, but between 2x-3x slower than master. My guess is the new code could win back that factor of 2x-3x with some careful profiling but I haven't done that yet.

Anyways, the point of this issue is to help me answer the question: What should a benchmarking regression test look like? Perhaps I can run this code under these 5 or so durations and simply make sure the runtimes don't exceed thresholds that are say.... 3x more than the current master branch performance? I could also check that the runtime is O(N) to within a factor of say 0.5 - 2 or 0.25 - 4? I'm not sure the right way to set threshold to make sure (1) we catch regressions but (2) we don't have code unluckily failing because it fails the benchmark. Maybe there's a way to make the benchmarking tests "information only" so they don't cause CI to fail, but they do alert code reviewers? Thoughts? I've also asked this question to get even more help with this since it's new to me.

@jagerber48 jagerber48 mentioned this issue Dec 16, 2024
5 tasks
jagerber48 added a commit that referenced this issue Dec 28, 2024
- [x] Closes #274 
- [x] Executed `pre-commit run --all-files` with no errors
- [x] The change is fully covered by automated unit tests
- [x] Documented in docs/ as appropriate
- [x] Added an entry to the CHANGES file

add a performance benchmark test. This test is important especially to
ensure #262 doesn't introduce a performance regression.

---------

Co-authored-by: andrewgsavage <andrewgsavage@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant