gh-91404: Improve `re` performance #91495

brandtbucher · 2022-04-13T03:57:00Z

This makes a few performance improvements in sre_lib.h:

Like the main VM, the regex engine now uses computed gotos on supported platforms.
Two read- and write-heavy members of the ctx pointer (ctx->pattern and ctx->ptr) are lifted into local variables.

(The diff looks gnarly, but everything inside the main switch is just mechanical replacements to support these two changes. It's not really that bad.)

It yields nice improvements on all of the expected benchmarks, and a 1% improvement overall:

Slower (6):
- scimark_monte_carlo: 398 ms +- 14 ms -> 406 ms +- 15 ms: 1.02x slower
- chaos: 407 ms +- 13 ms -> 413 ms +- 11 ms: 1.01x slower
- scimark_fft: 1.95 sec +- 0.02 sec -> 1.97 sec +- 0.02 sec: 1.01x slower
- tornado_http: 570 ms +- 17 ms -> 577 ms +- 17 ms: 1.01x slower
- sympy_expand: 2.77 sec +- 0.02 sec -> 2.79 sec +- 0.02 sec: 1.01x slower
- meteor_contest: 618 ms +- 11 ms -> 622 ms +- 12 ms: 1.01x slower

Faster (14):
- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster
- pycparser: 7.07 sec +- 0.10 sec -> 6.73 sec +- 0.10 sec: 1.05x faster
- pidigits: 1.20 sec +- 0.01 sec -> 1.16 sec +- 0.01 sec: 1.04x faster
- pickle_pure_python: 1.86 ms +- 0.13 ms -> 1.80 ms +- 0.13 ms: 1.03x faster
- scimark_sparse_mat_mult: 28.8 ms +- 1.5 ms -> 27.9 ms +- 1.5 ms: 1.03x faster
- fannkuch: 2.35 sec +- 0.03 sec -> 2.28 sec +- 0.02 sec: 1.03x faster
- float: 449 ms +- 13 ms -> 439 ms +- 11 ms: 1.02x faster
- xml_etree_iterparse: 625 ms +- 15 ms -> 613 ms +- 13 ms: 1.02x faster
- spectral_norm: 594 ms +- 11 ms -> 587 ms +- 13 ms: 1.01x faster
- xml_etree_parse: 943 ms +- 14 ms -> 934 ms +- 17 ms: 1.01x faster
- 2to3: 1.56 sec +- 0.01 sec -> 1.55 sec +- 0.01 sec: 1.01x faster
- sympy_sum: 946 ms +- 14 ms -> 938 ms +- 13 ms: 1.01x faster

Benchmark hidden because not significant (42): chameleon, crypto_pyaes, deltablue, django_template, dulwich_log, go, hexiom, html5lib, json, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, richards, scimark_lu, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_integrate, sympy_str, telco, thrift, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

Maybe re won't be slower in 3.11 after all! 🙃

markshannon · 2022-04-13T09:11:20Z

Nice speedup.

I wonder how much of the speedup would be achieved by just moving ctx->pattern and ctx->ptr into local variables and leaving the dispatch alone. I'm wondering this for a couple of reasons:

This paper suggests that it is the memory read, not the branch prediction, that has the larger impact on performance
We are concerned about dispatching on Windows, and it might be good to apply the results of that here as well.

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

serhiy-storchaka · 2022-04-13T11:09:33Z

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

@brandtbucher, could you show results with disabled computed gotos? Only regex-related benchmarks are interesting.

brandtbucher · 2022-04-13T17:16:25Z

With only computed gotos:

- regex_v8: 158 ms +- 9 ms -> 144 ms +- 11 ms: 1.09x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 0.9 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.30 sec +- 0.01 sec: 1.05x faster

With only new locals:

- regex_v8: 158 ms +- 9 ms -> 142 ms +- 12 ms: 1.11x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 1.0 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.22 sec +- 0.01 sec: 1.11x faster

Combined:

- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster

For regex_v8 and regex_effbot, it looks like each optimization contributes about 50% of the speedup. regex_dna seems to benefit mostly from the new locals, for some reason (perhaps because the switch dispatch for that one is already highly stable/predicable?).

I think we should keep both.

serhiy-storchaka · 2022-04-14T07:55:41Z

Excellent! Before merging, could you compare it with 3.10? If there is the same difference, it would be worth to add a note about speed up 10-20% in the NEWS and What's New files.

serhiy-storchaka · 2022-04-14T08:08:33Z

Misc/NEWS.d/next/Library/2022-04-12-19-08-13.gh-issue-91404.zjqYHo.rst

@@ -0,0 +1,3 @@
+Improve the performance of :mod:`re` matching by using computed gotos (or


Add that it speeds up matching by 10-20%.

Please don't. These claimed speedups are highly misleading.

The release notes for 3.9 and 3.10 have various claimed large speedups, yet 3.9 is no faster than 3.8 and 3.10 only a little bit faster.

The benchmark results are convincing to me.

Without this it is not clear why bother with making this change at all.

You'll only see a 10-20% speedup for these specific benchmarks, which are contrived.
Real programs, even those that spend a lot regexes will spend a much lower proportion of their time in the regex library.

Any number we give will be misleading.

This branch vs 3.10:

- regex_v8: 147 ms +- 12 ms -> 132 ms +- 11 ms: 1.11x faster - regex_effbot: 19.1 ms +- 1.2 ms -> 17.5 ms +- 0.9 ms: 1.10x faster - regex_dna: 1.30 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.04x faster

(Those numbers are very close to what you get by combining the latest re perf regressions with the numbers from this PR.)

I'll just leave the NEWS entry as-is (very few people read it anyways). I don't have a strong opinion on what should go in What's New (but @pablogsal might).

Perhaps the wording "up to 10% faster on the pyperformance regular expression benchmarks" (or similar) might suffice.

I think this should indeed go in the What's new. Could you add a small sentence in the Performance section, maybe? Specifying that the improvement is in the re engine and that it only affects re operations? If we need to remove it later is easier than the risk of forgetting to add something.

This is a backport of the upstream 3.11 improvement: python/cpython#91495 I only backported the ctx->pattern -> pattern and ctx->ptr -> ptr part because using computed goto actually decreased perf slightly on the opt build.

brandtbucher added 9 commits April 12, 2022 10:16

Try out computed gotos in the regex engine

89e4e5e

Reduce indirection

bc378d8

More of that

fa98784

Rearrange some stuff

48a7f1c

Whitespace

6f8a9a0

Generate sre_targets.h

03d3194

Catch up with main

24fa5a1

blurb add

0d30328

fixup

fdcde9f

brandtbucher added performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex labels Apr 13, 2022

brandtbucher requested a review from serhiy-storchaka April 13, 2022 03:57

bedevere-bot added the awaiting core review label Apr 13, 2022

brandtbucher mentioned this pull request Apr 13, 2022

Possible slowdown of regex searching in 3.11 #91404

Closed

brandtbucher self-assigned this Apr 13, 2022

serhiy-storchaka reviewed Apr 14, 2022

View reviewed changes

What's New

d38743f

ghost mentioned this pull request Apr 15, 2022

gh-91524: Speed up the regular expression substitution #91525

Merged

serhiy-storchaka approved these changes Apr 15, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Apr 15, 2022

brandtbucher merged commit 1b34b56 into python:main Apr 15, 2022

bedevere-bot removed the awaiting merge label Apr 15, 2022

undingen mentioned this pull request Apr 26, 2022

re: speedup matching by storing some state in local variables pyston/pyston#205

Merged

serhiy-storchaka added the 3.11 only security fixes label May 19, 2022

brandtbucher deleted the sre-goto branch July 21, 2022 19:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-91404: Improve `re` performance #91495

gh-91404: Improve `re` performance #91495

brandtbucher commented Apr 13, 2022 •

edited

Loading

markshannon commented Apr 13, 2022 •

edited

Loading

serhiy-storchaka commented Apr 13, 2022

brandtbucher commented Apr 13, 2022 •

edited

Loading

serhiy-storchaka commented Apr 14, 2022

serhiy-storchaka Apr 14, 2022

markshannon Apr 14, 2022

serhiy-storchaka Apr 14, 2022

markshannon Apr 14, 2022

brandtbucher Apr 14, 2022

brandtbucher Apr 14, 2022

pablogsal Apr 14, 2022 •

edited

Loading

		@@ -0,0 +1,3 @@
		Improve the performance of :mod:`re` matching by using computed gotos (or

gh-91404: Improve re performance #91495

gh-91404: Improve re performance #91495

Conversation

brandtbucher commented Apr 13, 2022 • edited Loading

markshannon commented Apr 13, 2022 • edited Loading

serhiy-storchaka commented Apr 13, 2022

brandtbucher commented Apr 13, 2022 • edited Loading

serhiy-storchaka commented Apr 14, 2022

serhiy-storchaka Apr 14, 2022

Choose a reason for hiding this comment

markshannon Apr 14, 2022

Choose a reason for hiding this comment

serhiy-storchaka Apr 14, 2022

Choose a reason for hiding this comment

markshannon Apr 14, 2022

Choose a reason for hiding this comment

brandtbucher Apr 14, 2022

Choose a reason for hiding this comment

brandtbucher Apr 14, 2022

Choose a reason for hiding this comment

pablogsal Apr 14, 2022 • edited Loading

Choose a reason for hiding this comment

gh-91404: Improve `re` performance #91495

gh-91404: Improve `re` performance #91495

brandtbucher commented Apr 13, 2022 •

edited

Loading

markshannon commented Apr 13, 2022 •

edited

Loading

brandtbucher commented Apr 13, 2022 •

edited

Loading

pablogsal Apr 14, 2022 •

edited

Loading