Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: optimize postal_parse_address implementation #49

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

lmores
Copy link
Contributor

@lmores lmores commented Jul 19, 2024

Intro

In this PR:

  • function mismo.lib.geo.postal_parse_address has been rewritten using my implementation
  • function mismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__direct_import is the same as the above one but avoids the optional import of postal inside its body (@NickCrews: don't worry, I was just curious if that import and the inner function declaration generated any relevant overhead... seems it is irrelevant, will remove it before merge)
  • function mismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__initial_impl is the original implementation by @jstammers

In this branch I also temporary restored the top-level benchmarks directory to manually gather execution times.
All benchmarks have been run on the full data set of 1M rows using duckdb 0.10.0, ibis-framework 9.1.0 and python 3.12.3 on the same laptop with no other programs running (except GNOME, etc).

OS: Ubuntu 21.04 x86_64 
Host: Latitude 5500 
Kernel: 5.11.0-49-generic 
CPU: Intel i7-8665U (8) @ 4.800GHz 
GPU: Intel WhiskeyLake-U GT2 [UHD Graphics 620] 
Memory: 8067MiB / 15813MiB 

Results

Results obtained with manual timings are as follows.

$ python -m benchmarks.parse_address
noop                                took  19.6631 seconds
python_only                         took  20.6381 seconds
postal_only                         took  99.4631 seconds
postal_parse_address                took 109.7075 seconds
postal_parse_address__direct_import took 110.1048 seconds
postal_parse_address__initial_impl  took 118.7811 seconds

On the other hand, pytest-benchmark reports the following.

$ just bench --benchmark-sort=mean --benchmark-group-by=param:nrows -k 1m mismo/lib/geo/tests/test_postal_benchmark.py
pytest --benchmark-only --benchmark-enable --benchmark-autosave --benchmark-sort=mean --benchmark-group-by=param:nrows -k 1m mismo/lib/geo/tests/test_postal_benchmark.py
======================================================================================================== test session starts =========================================================================================================
platform linux -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/lorenzo/devel/mismo
configfile: pyproject.toml
plugins: cov-5.0.0, benchmark-4.0.0, anyio-4.4.0
collected 24 items / 18 deselected / 6 selected                                                                                                                                                                                      

mismo/lib/geo/tests/test_postal_benchmark.py ......                                                                                                                                                                            [100%]
Saved benchmark data in: /home/lorenzo/devel/mismo/.benchmarks/Linux-CPython-3.12-64bit/0016_efaf478c0a66e11853fcf27704b1061ccff1fdff_20240719_124717.json



---------------------------------------------------------------------------------------------- benchmark 'nrows=1000000': 6 tests ---------------------------------------------------------------------------------------------
Name (time in s)                                                     Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_parse[1m-noop]                                    14.4026 (1.0)      14.7042 (1.0)      14.5671 (1.0)      0.1196 (1.0)      14.5971 (1.0)      0.1840 (1.0)           2;0  0.0686 (1.0)           5           1
test_benchmark_parse[1m-python_only]                             15.7207 (1.09)     16.1826 (1.10)     15.9631 (1.10)     0.2066 (1.73)     15.9745 (1.09)     0.3831 (2.08)          2;0  0.0626 (0.91)          5           1
test_benchmark_parse[1m-postal_only]                             67.2535 (4.67)     69.3878 (4.72)     68.0714 (4.67)     0.8035 (6.72)     67.7990 (4.64)     0.8358 (4.54)          2;0  0.0147 (0.21)          5           1
test_benchmark_parse[1m-postal_parse_address__direct_import]     73.5724 (5.11)     74.3592 (5.06)     73.9136 (5.07)     0.3081 (2.58)     73.9762 (5.07)     0.4267 (2.32)          2;0  0.0135 (0.20)          5           1
test_benchmark_parse[1m-postal_parse_address]                    73.3805 (5.09)     75.1065 (5.11)     73.9273 (5.07)     0.7074 (5.91)     73.7897 (5.06)     0.8699 (4.73)          1;0  0.0135 (0.20)          5           1
test_benchmark_parse[1m-postal_parse_address__initial_impl]      84.0357 (5.83)     87.4314 (5.95)     85.2971 (5.86)     1.5099 (12.62)    84.3496 (5.78)     2.3757 (12.91)         1;0  0.0117 (0.17)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
=========================================================================================== 6 passed, 18 deselected in 2350.17s (0:39:10) ============================================================================================

Values in the Mean column are remarkably lower than those reported manually (which lay way outside the StdDev or the IQR).
Isn't it quite/too strange? As far as I can tell pytest-benchmark is running each function 5 times, are we caching something that we should not and subsequent executions run faster than just one (as in manual timings)?
Also, in pytest benchmarks the difference between postal_parse_address and postal_only is more than 5s which is much bigger than the one from manual timings (<1s).

Additionally, before running the above results I recreated my virtualenv from zero to match mismo new requirements (python>=3.10, ibis-framework>=9.1.0) and the results I get are quite different also from those I reported in PR #48 where I was using ibis-framework 9.0.0 and duckdb 1.0.0 (not sure about which version of python though).

Questions

  1. Before going forward, @NickCrews and @jstammers could you run both benchmarks on your systems and share the results?
  2. @NickCrews: in light of the new results are you still willing to make the implementation change?

@jstammers
Copy link
Contributor

@lmores here's what I see when I run those benchmarks for the following system specs

OS: Manjaro Linux x86_64 
Kernel: 6.9.9-1-MANJARO 
CPU: AMD Ryzen 5 3600 (12) @ 3.600GHz 
GPU: NVIDIA GeForce GTX 1080 Ti 
Memory: 8515MiB / 32026MiB 
$ python -m benchmarks.parse_address
noop                                took  29.1380 seconds
python_only                         took  28.7530 seconds
postal_only                         took 113.7731 seconds
postal_parse_address                took 119.3624 seconds
postal_parse_address__direct_import took 119.5952 seconds
postal_parse_address__initial_impl  took 121.1622 seconds
============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jimmy/Work/mismo
configfile: pyproject.toml
plugins: anyio-4.4.0, cov-5.0.0, benchmark-4.0.0
collected 24 items / 18 deselected / 6 selected

mismo/lib/geo/tests/test_postal_benchmark.py ......                      [100%]


---------------------------------------------------------------------------------------------- benchmark 'nrows=1000000': 6 tests ---------------------------------------------------------------------------------------------
Name (time in s)                                                     Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_parse[1m-noop]                                    12.8445 (1.0)      12.9714 (1.0)      12.9009 (1.0)      0.0570 (1.0)      12.9015 (1.0)      0.1046 (1.44)          1;0  0.0775 (1.0)           5           1
test_benchmark_parse[1m-python_only]                             13.5212 (1.05)     13.7194 (1.06)     13.5844 (1.05)     0.0782 (1.37)     13.5659 (1.05)     0.0728 (1.0)           1;1  0.0736 (0.95)          5           1
test_benchmark_parse[1m-postal_only]                             51.1747 (3.98)     51.4688 (3.97)     51.2672 (3.97)     0.1179 (2.07)     51.2128 (3.97)     0.1199 (1.65)          1;0  0.0195 (0.25)          5           1
test_benchmark_parse[1m-postal_parse_address]                    56.0817 (4.37)     56.8488 (4.38)     56.2741 (4.36)     0.3264 (5.73)     56.1196 (4.35)     0.2928 (4.02)          1;1  0.0178 (0.23)          5           1
test_benchmark_parse[1m-postal_parse_address__direct_import]     56.2151 (4.38)     56.4120 (4.35)     56.3279 (4.37)     0.0975 (1.71)     56.3729 (4.37)     0.1837 (2.52)          2;0  0.0178 (0.23)          5           1
test_benchmark_parse[1m-postal_parse_address__initial_impl]      58.3636 (4.54)     58.5571 (4.51)     58.4671 (4.53)     0.0819 (1.44)     58.4389 (4.53)     0.1328 (1.82)          2;0  0.0171 (0.22)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================ 6 passed, 18 deselected in 1769.89s (0:29:29) =================

Like you, I also see a significant improvement in the runtime performance when using pytest-benchmark

@NickCrews
Copy link
Owner

Hmm, really curious about this one, I'll dig in once I'm at a computer. Thanks for the effort. First thought is all the timings seem cut in nearly exactly half, maybe were accidentally doing computation twice in the non-pytest harness?

@lmores
Copy link
Contributor Author

lmores commented Jul 26, 2024

maybe were accidentally doing computation twice in the non-pytest harness?

Quite sure this is not the case (also, my timings are a bit far from being halved).

Differences may be due to the fact that in non-pytest benchmarks I was using DuckDB that writes to disk, and not in memory as in pytest-benchmark... however, this is what I get when running non-pytest benchmarks using a memtable (doesn't change much, still far from the others):

noop                                took  19.4047 seconds
python_only                         took  20.3601 seconds
postal_only                         took 104.8952 seconds
postal_parse_address                took 113.1792 seconds
postal_parse_address__direct_import took 119.0952 seconds
postal_parse_address__initial_impl  took 121.0536 seconds

At first I thought that pytest-benchmark library did something as simple as endTimestamp - startTimestamp to measure execution time, but as briefly explained here the measurement technique they employ seems to follow a more "probabilistical" approach (something like "wake up each X ms and see who is running").
Could it be the root cause of the differences???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants