Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the plugins built by pyo3 run slower than cython ones? #1470

Closed
RedmondLee opened this issue Mar 6, 2021 · 11 comments
Closed

Why is the plugins built by pyo3 run slower than cython ones? #1470

RedmondLee opened this issue Mar 6, 2021 · 11 comments

Comments

@RedmondLee
Copy link

🌍 Environment

  • Your operating system and version: Ubuntu 20.04
  • Your python version: Python 3.8.2
  • How did you install python (e.g. apt or pyenv)? Did you use a virtualenv?: Build from source code / NO
  • Your Rust version (rustc --version): 1.50.0
  • Your PyO3 version: latest

Details

As a rust beginner, I found that using the pyo3 plugins had more overhead when called than the traditional cython approach, this is no good news to make fine-grained embedded development with pyo3.

I asked a question on stackoverflow, a number of people who have followed up on this issue have replicated similar results, but due to the lack of underlying knowledge, we can't explain how this came about. Can I find the answer here? thanks

The question link is https://stackoverflow.com/questions/66467640/why-cython-embeded-plugins-has-higher-performance-in-cpython-interpreter-than-ru

@davidhewitt
Copy link
Member

davidhewitt commented Mar 6, 2021

HI @RedmondLee, thanks for the question and thoughtful analysis.

Your results are consistent with what we've previously measured. See #661 and #1440 (comment), for example.

My view on this is that at this point in PyO3's lifecycle, it's expected. We're comparing against CPython, which is a very mature and carefully optimized project, and cython, which is a tool dedicated to performance.

To take an example in point: last I looked cython had its own custom implementation of Python function objects so that it could backport the METH_FASTCALL (aka Vectorcall) optimization from Python 3.7 (where CPython primarily supports it) to older Python versions supported by cython.

We have so far only implemented Vectorcall support for very simple Rust-calling-Python cases on Python 3.9+, so do not benefit from this optimization at all in the direction you are measuring.


Ultimately, PyO3 is of course also a tool for accelerating Python programs. cython has shown what performance is possible; we should be able to make the overheads comparable given enough time and resources. It's a constant balancing act to add more functionality to PyO3 versus refine and optimize the feature coverage we already have.

If your algorithm is complex enough, the overheads are the minority of the runtime and you'll already see performance improvements against Python. cython may be harder to beat using PyO3, however you get the pleasure of writing Rust code instead of a C-Python hybrid.

@RedmondLee
Copy link
Author

@davidhewitt Thanks for your kind reply. If I'm understanding correctly, according to your answer, there's no definitive difference between the way pyo3 and cython are called in cpython (comparing to the way pypy3 calls the c plugin, as far as I know, is fundamentally different from python, which is the main reason for its compatibility difference). The only problem why pyo3 is slower than cython is that it still needs to be optimized.

@davidhewitt
Copy link
Member

davidhewitt commented Mar 6, 2021

@RedmondLee yes that's a fair summary. Some of the optimizing work we know we can do in PyO3 in the future. Some overheads may eventually be different to cython. In Rust we're also promising safety, and this might mean that we can't make exactly the same set of assumptions cython can make.

(Whether that means we'll be faster or slower than cython - I don't know, but if we can put in the work we should be able to get close.)

@RedmondLee
Copy link
Author

Thanks! I'm very much looking forward to seeing your final results, using the rust the development of python acceleration plugins has greatly reduced my mental strain. I feel very regret that I'm not up to the level to contribute to this project.

@itamarst
Copy link
Contributor

itamarst commented Apr 29, 2021

So I just encountered this as well, and it seems like some this overhead ought to be fixable. This is Python 3.9, so presumably vector call could be used, and functions that are no-ops:

In [5]: from overhead import do_nothing as do_nothing_rust

In [6]: from overhead_cython import do_nothing as do_nothing_cython

In [7]: %timeit do_nothing_cython()
30 ns ± 0.0352 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [8]: %timeit do_nothing_rust()
85 ns ± 0.0965 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

The Cython code:

def do_nothing():
    pass

The Rust code, compiled with maturin develop --release:

use pyo3::prelude::*;
use pyo3::wrap_pyfunction;

#[pyfunction]
fn do_nothing() {}

#[pymodule]
/// A Python module implemented in Rust.
fn overhead(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_function(wrap_pyfunction!(do_nothing, m)?)?;

    Ok(())
}

@davidhewitt
Copy link
Member

That's absolutely correct, and just a couple of the optimizations I'd like to add. Unfortunately I can only do so much, and I've been working on other pieces of pyo3 since the last message in this thread.

Anyone who's interested in helping implement these optimizations is very welcome to ask me for some pointers on where to get started.

@birkenfeld
Copy link
Member

I should have some time to try soon, if you got the pointers :)

@davidhewitt
Copy link
Member

👍 I'll try to write something useful at the weekend!

@birkenfeld
Copy link
Member

@davidhewitt gentle ping :)

@davidhewitt
Copy link
Member

Absolutely, sorry haven't forgotten about this just had no opportunity to sit down and put my thoughts into something coherent!

@davidhewitt
Copy link
Member

(I'm optimistic I can find some time to do this tomorrow evening!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants