Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-91404: Improve re performance #91495

Merged
merged 10 commits into from
Apr 15, 2022
Merged

Conversation

brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Apr 13, 2022

This makes a few performance improvements in sre_lib.h:

  • Like the main VM, the regex engine now uses computed gotos on supported platforms.
  • Two read- and write-heavy members of the ctx pointer (ctx->pattern and ctx->ptr) are lifted into local variables.

(The diff looks gnarly, but everything inside the main switch is just mechanical replacements to support these two changes. It's not really that bad.)

It yields nice improvements on all of the expected benchmarks, and a 1% improvement overall:

Slower (6):
- scimark_monte_carlo: 398 ms +- 14 ms -> 406 ms +- 15 ms: 1.02x slower
- chaos: 407 ms +- 13 ms -> 413 ms +- 11 ms: 1.01x slower
- scimark_fft: 1.95 sec +- 0.02 sec -> 1.97 sec +- 0.02 sec: 1.01x slower
- tornado_http: 570 ms +- 17 ms -> 577 ms +- 17 ms: 1.01x slower
- sympy_expand: 2.77 sec +- 0.02 sec -> 2.79 sec +- 0.02 sec: 1.01x slower
- meteor_contest: 618 ms +- 11 ms -> 622 ms +- 12 ms: 1.01x slower

Faster (14):
- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster
- pycparser: 7.07 sec +- 0.10 sec -> 6.73 sec +- 0.10 sec: 1.05x faster
- pidigits: 1.20 sec +- 0.01 sec -> 1.16 sec +- 0.01 sec: 1.04x faster
- pickle_pure_python: 1.86 ms +- 0.13 ms -> 1.80 ms +- 0.13 ms: 1.03x faster
- scimark_sparse_mat_mult: 28.8 ms +- 1.5 ms -> 27.9 ms +- 1.5 ms: 1.03x faster
- fannkuch: 2.35 sec +- 0.03 sec -> 2.28 sec +- 0.02 sec: 1.03x faster
- float: 449 ms +- 13 ms -> 439 ms +- 11 ms: 1.02x faster
- xml_etree_iterparse: 625 ms +- 15 ms -> 613 ms +- 13 ms: 1.02x faster
- spectral_norm: 594 ms +- 11 ms -> 587 ms +- 13 ms: 1.01x faster
- xml_etree_parse: 943 ms +- 14 ms -> 934 ms +- 17 ms: 1.01x faster
- 2to3: 1.56 sec +- 0.01 sec -> 1.55 sec +- 0.01 sec: 1.01x faster
- sympy_sum: 946 ms +- 14 ms -> 938 ms +- 13 ms: 1.01x faster

Benchmark hidden because not significant (42): chameleon, crypto_pyaes, deltablue, django_template, dulwich_log, go, hexiom, html5lib, json, json_dumps, json_loads, logging_format, logging_silent, logging_simple, mako, nbody, nqueens, pathlib, pickle, pickle_dict, pickle_list, pyflate, python_startup, python_startup_no_site, raytrace, regex_compile, richards, scimark_lu, scimark_sor, sqlalchemy_declarative, sqlalchemy_imperative, sqlite_synth, sympy_integrate, sympy_str, telco, thrift, unpack_sequence, unpickle, unpickle_list, unpickle_pure_python, xml_etree_generate, xml_etree_process

Geometric mean: 1.01x faster

Maybe re won't be slower in 3.11 after all! 🙃

@markshannon
Copy link
Member

markshannon commented Apr 13, 2022

Nice speedup.

I wonder how much of the speedup would be achieved by just moving ctx->pattern and ctx->ptr into local variables and leaving the dispatch alone. I'm wondering this for a couple of reasons:

  1. This paper suggests that it is the memory read, not the branch prediction, that has the larger impact on performance
  2. We are concerned about dispatching on Windows, and it might be good to apply the results of that here as well.

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

@serhiy-storchaka
Copy link
Member

How easy would it be to apply the changes that move ctx->pattern and ctx->ptr, without the dispatching changes?

@brandtbucher, could you show results with disabled computed gotos? Only regex-related benchmarks are interesting.

@brandtbucher brandtbucher self-assigned this Apr 13, 2022
@brandtbucher
Copy link
Member Author

brandtbucher commented Apr 13, 2022

With only computed gotos:

- regex_v8: 158 ms +- 9 ms -> 144 ms +- 11 ms: 1.09x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 0.9 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.30 sec +- 0.01 sec: 1.05x faster

With only new locals:

- regex_v8: 158 ms +- 9 ms -> 142 ms +- 12 ms: 1.11x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 18.2 ms +- 1.0 ms: 1.05x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.22 sec +- 0.01 sec: 1.11x faster

Combined:

- regex_v8: 158 ms +- 9 ms -> 132 ms +- 11 ms: 1.20x faster
- regex_effbot: 19.2 ms +- 1.0 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.36 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.09x faster

For regex_v8 and regex_effbot, it looks like each optimization contributes about 50% of the speedup. regex_dna seems to benefit mostly from the new locals, for some reason (perhaps because the switch dispatch for that one is already highly stable/predicable?).

I think we should keep both.

@serhiy-storchaka
Copy link
Member

Excellent! Before merging, could you compare it with 3.10? If there is the same difference, it would be worth to add a note about speed up 10-20% in the NEWS and What's New files.

@@ -0,0 +1,3 @@
Improve the performance of :mod:`re` matching by using computed gotos (or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add that it speeds up matching by 10-20%.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't. These claimed speedups are highly misleading.

The release notes for 3.9 and 3.10 have various claimed large speedups, yet 3.9 is no faster than 3.8 and 3.10 only a little bit faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmark results are convincing to me.

Without this it is not clear why bother with making this change at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll only see a 10-20% speedup for these specific benchmarks, which are contrived.
Real programs, even those that spend a lot regexes will spend a much lower proportion of their time in the regex library.

Any number we give will be misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch vs 3.10:

- regex_v8: 147 ms +- 12 ms -> 132 ms +- 11 ms: 1.11x faster
- regex_effbot: 19.1 ms +- 1.2 ms -> 17.5 ms +- 0.9 ms: 1.10x faster
- regex_dna: 1.30 sec +- 0.01 sec -> 1.24 sec +- 0.01 sec: 1.04x faster

(Those numbers are very close to what you get by combining the latest re perf regressions with the numbers from this PR.)

I'll just leave the NEWS entry as-is (very few people read it anyways). I don't have a strong opinion on what should go in What's New (but @pablogsal might).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the wording "up to 10% faster on the pyperformance regular expression benchmarks" (or similar) might suffice.

Copy link
Member

@pablogsal pablogsal Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should indeed go in the What's new. Could you add a small sentence in the Performance section, maybe? Specifying that the improvement is in the re engine and that it only affects re operations? If we need to remove it later is easier than the risk of forgetting to add something.

@brandtbucher brandtbucher merged commit 1b34b56 into python:main Apr 15, 2022
undingen added a commit to undingen/pyston_v2 that referenced this pull request Apr 26, 2022
This is a backport of the upstream 3.11 improvement:
python/cpython#91495

I only backported the ctx->pattern -> pattern and ctx->ptr -> ptr
part because using computed goto actually decreased perf slightly on the opt build.
@serhiy-storchaka serhiy-storchaka added the 3.11 only security fixes label May 19, 2022
@brandtbucher brandtbucher deleted the sre-goto branch July 21, 2022 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes performance Performance or resource usage stdlib Python modules in the Lib dir topic-regex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants