Perf: optimize postal_parse_address implementation #49

lmores · 2024-07-19T13:59:06Z

Intro

In this PR:

function mismo.lib.geo.postal_parse_address has been rewritten using my implementation
function mismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__direct_import is the same as the above one but avoids the optional import of postal inside its body (@NickCrews: don't worry, I was just curious if that import and the inner function declaration generated any relevant overhead... seems it is irrelevant, will remove it before merge)
function mismo.lib.geo.tests.test_postal_benchmark.postal_parse_address__initial_impl is the original implementation by @jstammers

In this branch I also temporary restored the top-level benchmarks directory to manually gather execution times.
All benchmarks have been run on the full data set of 1M rows using duckdb 0.10.0, ibis-framework 9.1.0 and python 3.12.3 on the same laptop with no other programs running (except GNOME, etc).

OS: Ubuntu 21.04 x86_64 
Host: Latitude 5500 
Kernel: 5.11.0-49-generic 
CPU: Intel i7-8665U (8) @ 4.800GHz 
GPU: Intel WhiskeyLake-U GT2 [UHD Graphics 620] 
Memory: 8067MiB / 15813MiB

Results

Results obtained with manual timings are as follows.

$ python -m benchmarks.parse_address
noop                                took  19.6631 seconds
python_only                         took  20.6381 seconds
postal_only                         took  99.4631 seconds
postal_parse_address                took 109.7075 seconds
postal_parse_address__direct_import took 110.1048 seconds
postal_parse_address__initial_impl  took 118.7811 seconds

On the other hand, pytest-benchmark reports the following.

$ just bench --benchmark-sort=mean --benchmark-group-by=param:nrows -k 1m mismo/lib/geo/tests/test_postal_benchmark.py
pytest --benchmark-only --benchmark-enable --benchmark-autosave --benchmark-sort=mean --benchmark-group-by=param:nrows -k 1m mismo/lib/geo/tests/test_postal_benchmark.py
======================================================================================================== test session starts =========================================================================================================
platform linux -- Python 3.12.3, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/lorenzo/devel/mismo
configfile: pyproject.toml
plugins: cov-5.0.0, benchmark-4.0.0, anyio-4.4.0
collected 24 items / 18 deselected / 6 selected                                                                                                                                                                                      

mismo/lib/geo/tests/test_postal_benchmark.py ......                                                                                                                                                                            [100%]
Saved benchmark data in: /home/lorenzo/devel/mismo/.benchmarks/Linux-CPython-3.12-64bit/0016_efaf478c0a66e11853fcf27704b1061ccff1fdff_20240719_124717.json



---------------------------------------------------------------------------------------------- benchmark 'nrows=1000000': 6 tests ---------------------------------------------------------------------------------------------
Name (time in s)                                                     Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_parse[1m-noop]                                    14.4026 (1.0)      14.7042 (1.0)      14.5671 (1.0)      0.1196 (1.0)      14.5971 (1.0)      0.1840 (1.0)           2;0  0.0686 (1.0)           5           1
test_benchmark_parse[1m-python_only]                             15.7207 (1.09)     16.1826 (1.10)     15.9631 (1.10)     0.2066 (1.73)     15.9745 (1.09)     0.3831 (2.08)          2;0  0.0626 (0.91)          5           1
test_benchmark_parse[1m-postal_only]                             67.2535 (4.67)     69.3878 (4.72)     68.0714 (4.67)     0.8035 (6.72)     67.7990 (4.64)     0.8358 (4.54)          2;0  0.0147 (0.21)          5           1
test_benchmark_parse[1m-postal_parse_address__direct_import]     73.5724 (5.11)     74.3592 (5.06)     73.9136 (5.07)     0.3081 (2.58)     73.9762 (5.07)     0.4267 (2.32)          2;0  0.0135 (0.20)          5           1
test_benchmark_parse[1m-postal_parse_address]                    73.3805 (5.09)     75.1065 (5.11)     73.9273 (5.07)     0.7074 (5.91)     73.7897 (5.06)     0.8699 (4.73)          1;0  0.0135 (0.20)          5           1
test_benchmark_parse[1m-postal_parse_address__initial_impl]      84.0357 (5.83)     87.4314 (5.95)     85.2971 (5.86)     1.5099 (12.62)    84.3496 (5.78)     2.3757 (12.91)         1;0  0.0117 (0.17)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
=========================================================================================== 6 passed, 18 deselected in 2350.17s (0:39:10) ============================================================================================

Values in the Mean column are remarkably lower than those reported manually (which lay way outside the StdDev or the IQR).
Isn't it quite/too strange? As far as I can tell pytest-benchmark is running each function 5 times, are we caching something that we should not and subsequent executions run faster than just one (as in manual timings)?
Also, in pytest benchmarks the difference between postal_parse_address and postal_only is more than 5s which is much bigger than the one from manual timings (<1s).

Additionally, before running the above results I recreated my virtualenv from zero to match mismo new requirements (python>=3.10, ibis-framework>=9.1.0) and the results I get are quite different also from those I reported in PR #48 where I was using ibis-framework 9.0.0 and duckdb 1.0.0 (not sure about which version of python though).

Questions

Before going forward, @NickCrews and @jstammers could you run both benchmarks on your systems and share the results?
@NickCrews: in light of the new results are you still willing to make the implementation change?

jstammers · 2024-07-24T21:44:04Z

@lmores here's what I see when I run those benchmarks for the following system specs

OS: Manjaro Linux x86_64 
Kernel: 6.9.9-1-MANJARO 
CPU: AMD Ryzen 5 3600 (12) @ 3.600GHz 
GPU: NVIDIA GeForce GTX 1080 Ti 
Memory: 8515MiB / 32026MiB

$ python -m benchmarks.parse_address
noop                                took  29.1380 seconds
python_only                         took  28.7530 seconds
postal_only                         took 113.7731 seconds
postal_parse_address                took 119.3624 seconds
postal_parse_address__direct_import took 119.5952 seconds
postal_parse_address__initial_impl  took 121.1622 seconds

============================= test session starts ==============================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/jimmy/Work/mismo
configfile: pyproject.toml
plugins: anyio-4.4.0, cov-5.0.0, benchmark-4.0.0
collected 24 items / 18 deselected / 6 selected

mismo/lib/geo/tests/test_postal_benchmark.py ......                      [100%]


---------------------------------------------------------------------------------------------- benchmark 'nrows=1000000': 6 tests ---------------------------------------------------------------------------------------------
Name (time in s)                                                     Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_parse[1m-noop]                                    12.8445 (1.0)      12.9714 (1.0)      12.9009 (1.0)      0.0570 (1.0)      12.9015 (1.0)      0.1046 (1.44)          1;0  0.0775 (1.0)           5           1
test_benchmark_parse[1m-python_only]                             13.5212 (1.05)     13.7194 (1.06)     13.5844 (1.05)     0.0782 (1.37)     13.5659 (1.05)     0.0728 (1.0)           1;1  0.0736 (0.95)          5           1
test_benchmark_parse[1m-postal_only]                             51.1747 (3.98)     51.4688 (3.97)     51.2672 (3.97)     0.1179 (2.07)     51.2128 (3.97)     0.1199 (1.65)          1;0  0.0195 (0.25)          5           1
test_benchmark_parse[1m-postal_parse_address]                    56.0817 (4.37)     56.8488 (4.38)     56.2741 (4.36)     0.3264 (5.73)     56.1196 (4.35)     0.2928 (4.02)          1;1  0.0178 (0.23)          5           1
test_benchmark_parse[1m-postal_parse_address__direct_import]     56.2151 (4.38)     56.4120 (4.35)     56.3279 (4.37)     0.0975 (1.71)     56.3729 (4.37)     0.1837 (2.52)          2;0  0.0178 (0.23)          5           1
test_benchmark_parse[1m-postal_parse_address__initial_impl]      58.3636 (4.54)     58.5571 (4.51)     58.4671 (4.53)     0.0819 (1.44)     58.4389 (4.53)     0.1328 (1.82)          2;0  0.0171 (0.22)          5           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
================ 6 passed, 18 deselected in 1769.89s (0:29:29) =================

Like you, I also see a significant improvement in the runtime performance when using pytest-benchmark

NickCrews · 2024-07-25T15:20:28Z

Hmm, really curious about this one, I'll dig in once I'm at a computer. Thanks for the effort. First thought is all the timings seem cut in nearly exactly half, maybe were accidentally doing computation twice in the non-pytest harness?

lmores · 2024-07-26T08:48:49Z

maybe were accidentally doing computation twice in the non-pytest harness?

Quite sure this is not the case (also, my timings are a bit far from being halved).

Differences may be due to the fact that in non-pytest benchmarks I was using DuckDB that writes to disk, and not in memory as in pytest-benchmark... however, this is what I get when running non-pytest benchmarks using a memtable (doesn't change much, still far from the others):

noop                                took  19.4047 seconds
python_only                         took  20.3601 seconds
postal_only                         took 104.8952 seconds
postal_parse_address                took 113.1792 seconds
postal_parse_address__direct_import took 119.0952 seconds
postal_parse_address__initial_impl  took 121.0536 seconds

At first I thought that pytest-benchmark library did something as simple as endTimestamp - startTimestamp to measure execution time, but as briefly explained here the measurement technique they employ seems to follow a more "probabilistical" approach (something like "wake up each X ms and see who is running").
Could it be the root cause of the differences???

lmores added 2 commits July 19, 2024 14:34

perf: optimize postal_parse_address implementation

42766f3

tmp: restore previous benchmarking script

efaf478

Run manual benchmarks using a memtable

2e63577

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: optimize postal_parse_address implementation #49

Perf: optimize postal_parse_address implementation #49

lmores commented Jul 19, 2024 •

edited

Loading

jstammers commented Jul 24, 2024

NickCrews commented Jul 25, 2024

lmores commented Jul 26, 2024

Perf: optimize postal_parse_address implementation #49

Are you sure you want to change the base?

Perf: optimize postal_parse_address implementation #49

Conversation

lmores commented Jul 19, 2024 • edited Loading

Intro

Results

Questions

jstammers commented Jul 24, 2024

NickCrews commented Jul 25, 2024

lmores commented Jul 26, 2024

lmores commented Jul 19, 2024 •

edited

Loading