Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

base64 module: Link against SIMD library for 10x performance. #124951

Open
gpshead opened this issue Oct 3, 2024 · 6 comments
Open

base64 module: Link against SIMD library for 10x performance. #124951

gpshead opened this issue Oct 3, 2024 · 6 comments
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@gpshead
Copy link
Member

gpshead commented Oct 3, 2024

Performance enhancement

Proposal:

https://pypi.org/project/pybase64/ aka https://github.com/mayeut/pybase64 (BSD licensed) exists. On top of some of its own SIMD code for base64 module extra features (character translation)^, it links against https://github.com/aklomp/base64, a BSD licensed C99 library with SIMD acceleration giving 5-20x performance on base64 encoding and decoding operations vs our existing generic byte based base64 C code.

We could adopt a bunch of the pybase64 code to make the default base64 module experience better - it is relatively straight forward extension module code (as one would expect). On the other hand, I expect pybase64 to still be where new development and further improvements in this space continue to happen as people who care strongly about performance need the latest and greatest from PyPI regardless of their current CPython version. (looping in @mayeut for thoughts on that)

Practicalities: Library availability? we'd vendor a libbase64 build for use on our binary distributions. I don't think it is currently widely available (? I only did a quick search on Ubuntu) as a package on Linux distributions though so we'd currently need to vendor our own copy in tree to be fair and match the good performance there (yuck, but ideally only temporary until distros pick it up as a package of its own, consider it similar to a Modules/_decimal/libmpdec/ situation - our configure.ac finds an installed one & distros link against that)

Risks: It is a new C library dependency. Security concerns within it thus become our own. As base64 is frequently used to process untrusted input. But its surface of possible problems is limited (very simple data format). We should ensure the library gets proper oss-fuzz test coverage before adoption (@aklomp for visibility).


^ bytes.translate, bytearray.translate, or str.translate might benefit from similar SIMD treatment - which would be better from a CPython perspective than only doing that within this module? If so, lets file a new issue just for that bit.


❯ python -m pybase64 benchmark `which python`
pybase64 1.4.0 (C extension active - NEON)  # running on my Apple M3
bench: altchars=None, validate=False
pybase64._pybase64.encodebytes:   4776.815 MB/s (5,936,128 bytes -> 8,018,983 bytes)
pybase64._pybase64.b64encode:    11989.872 MB/s (5,936,128 bytes -> 7,914,840 bytes)
pybase64._pybase64.b64decode:     3039.329 MB/s (7,914,840 bytes -> 5,936,128 bytes)
base64.encodebytes:                292.876 MB/s (5,936,128 bytes -> 8,018,983 bytes)
base64.b64encode:                  601.307 MB/s (5,936,128 bytes -> 7,914,840 bytes)
base64.b64decode:                  492.088 MB/s (7,914,840 bytes -> 5,936,128 bytes)
bench: altchars=None, validate=True
pybase64._pybase64.b64encode:    12327.286 MB/s (5,936,128 bytes -> 7,914,840 bytes)
pybase64._pybase64.b64decode:     8611.733 MB/s (7,914,840 bytes -> 5,936,128 bytes)
base64.b64encode:                  597.389 MB/s (5,936,128 bytes -> 7,914,840 bytes)
base64.b64decode:                  472.430 MB/s (7,914,840 bytes -> 5,936,128 bytes)
bench: altchars=b'-_', validate=False
pybase64._pybase64.b64encode:     1287.615 MB/s (5,936,128 bytes -> 7,914,840 bytes)
pybase64._pybase64.b64decode:     2524.966 MB/s (7,914,840 bytes -> 5,936,128 bytes)
base64.b64encode:                  473.320 MB/s (5,936,128 bytes -> 7,914,840 bytes)
base64.b64decode:                  406.411 MB/s (7,914,840 bytes -> 5,936,128 bytes)
bench: altchars=b'-_', validate=True
pybase64._pybase64.b64encode:     1283.111 MB/s (5,936,128 bytes -> 7,914,840 bytes)
pybase64._pybase64.b64decode:     6745.809 MB/s (7,914,840 bytes -> 5,936,128 bytes)
base64.b64encode:                  464.526 MB/s (5,936,128 bytes -> 7,914,840 bytes)
base64.b64decode:                  391.959 MB/s (7,914,840 bytes -> 5,936,128 bytes)

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

If we spawn Discuss threads around this, lets edit and drop links here.

@gpshead gpshead added type-feature A feature request or enhancement performance Performance or resource usage stdlib Python modules in the Lib dir labels Oct 3, 2024
@rruuaanng
Copy link
Contributor

I think this is a good idea, but it seems that relying too much on external C libraries may cause instability (such as many linking and compilation problems). Do you have any better solution to this problem?

@gpshead
Copy link
Member Author

gpshead commented Oct 4, 2024

We're not really concerned about that. We aleady link against a bunch of libraries and have vendored a few (as much as we try to avoid that), including some involving SIMD code, in the past.

@picnixz
Copy link
Contributor

picnixz commented Oct 4, 2024

For the performances themselves, they are worth the inclusion IMO (base64 is very common in web applications)

I don't think it is currently widely available

The default base64 lib comes with coreutils IIRC (at least on most Linux distributions). But for SIMD support, I don't know of any that is well-known (there is an Alpine Linux package for that though which directly uses the lib you linked but well.. that's the only one).

yuck, but ideally only temporary until distros pick it up as a package of its own

Yes, but that could give distros more confidence and reasons in pushing it into a proper libbase64-devel package. We could also cut off some parts that are not needed and platforms that are not supported

bytes.translate, bytearray.translate, or str.translate might benefit from similar SIMD treatment

I think it's a good idea to make built-in functions faster when possible if the cost of introducing the overall machinary is not too much (but yes, let's open a separate issue if needs arise).

@corona10
Copy link
Member

corona10 commented Oct 6, 2024

From a general approach, I am not sure it's worth maintaining SIMD-related logic in CPython.
(I know that we already have such logic from some modules, but most of them are vendored libraries..)
There have been several attempts to accelerate the JSON module by using SIMD too (and they are actively managed by 3rd parties). Are we going to do the same thing? or do we have any criteria for what to accelerate from CPython or what to accelerate from 3rd party side?

@gpshead
Copy link
Member Author

gpshead commented Oct 6, 2024

FWIW, we already have SIMD and/or accelerated instruction use in hashlib code (and I expect to add more, as HACL* can provide it to us - our goal is to eliminate the need to link hashlib against openssl).

base64 is a much simpler obvious use for SIMD as the algorithm is dead simple and clearly amenable to doing a lot of well patterned simple math at once. And is frequently done on bulk data due to how common standards embed binary in text using it.

It feels like the kind of thing that a C version of, if written in the appropriate manner, might get a compiler to auto-vectorize? But tricking compilers into doing that for you instead of explicitly just doing what you wanted to happen can be less maintainable.

In general though I wouldn't reach for SIMD on things like our main data types unless it demonstrated a meaningful measurable performance suite win. (bytes.translate is rather niche in that regard)

JSON is a similarly important widely used bulk data format. A lot more complicated to use SIMD on. There are a variety of libraries out there doing it. If anyone wants to consider a behavior compatible version doing that as the json stdlib module, that belongs in its own issue.


A more interesting library to consider linking against than what I opened with may be https://github.com/simdutf/simdutf which covers UTF8 & friends as well as base64. (though it is C++ and I believe we've so far avoided requiring C++ linkage within CPython itself?)

Unrelated other than the SIMD theme - https://pypi.org/project/stringzilla/. It does things that even adopting simplest conservative versions of them for use by our CPython internals in a compatible manner might be enough to show up on the performance suite. Someone would need to take on trying it and testing to find out.

@gpshead
Copy link
Member Author

gpshead commented Oct 6, 2024

do we have any criteria for what to accelerate from CPython or what to accelerate from 3rd party side?

My #.1 criteria is maintainability. #.2 criteria is "is it widely used on enough data to matter?" with a #.3 feeding into that of "Would doing this reduce the need to train people to always use a special PyPI package instead of the stdlib for their common task that either can do because of a performance concern?"

@gpshead gpshead removed the type-feature A feature request or enhancement label Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

4 participants