-
Notifications
You must be signed in to change notification settings - Fork 801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why is the plugins built by pyo3 run slower than cython ones? #1470
Comments
HI @RedmondLee, thanks for the question and thoughtful analysis. Your results are consistent with what we've previously measured. See #661 and #1440 (comment), for example. My view on this is that at this point in PyO3's lifecycle, it's expected. We're comparing against CPython, which is a very mature and carefully optimized project, and To take an example in point: last I looked We have so far only implemented Vectorcall support for very simple Rust-calling-Python cases on Python 3.9+, so do not benefit from this optimization at all in the direction you are measuring. Ultimately, PyO3 is of course also a tool for accelerating Python programs. If your algorithm is complex enough, the overheads are the minority of the runtime and you'll already see performance improvements against Python. |
@davidhewitt Thanks for your kind reply. If I'm understanding correctly, according to your answer, there's no definitive difference between the way pyo3 and cython are called in cpython (comparing to the way pypy3 calls the c plugin, as far as I know, is fundamentally different from python, which is the main reason for its compatibility difference). The only problem why pyo3 is slower than cython is that it still needs to be optimized. |
@RedmondLee yes that's a fair summary. Some of the optimizing work we know we can do in PyO3 in the future. Some overheads may eventually be different to (Whether that means we'll be faster or slower than cython - I don't know, but if we can put in the work we should be able to get close.) |
Thanks! I'm very much looking forward to seeing your final results, using the rust the development of python acceleration plugins has greatly reduced my mental strain. I feel very regret that I'm not up to the level to contribute to this project. |
So I just encountered this as well, and it seems like some this overhead ought to be fixable. This is Python 3.9, so presumably vector call could be used, and functions that are no-ops: In [5]: from overhead import do_nothing as do_nothing_rust
In [6]: from overhead_cython import do_nothing as do_nothing_cython
In [7]: %timeit do_nothing_cython()
30 ns ± 0.0352 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [8]: %timeit do_nothing_rust()
85 ns ± 0.0965 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) The Cython code: def do_nothing():
pass The Rust code, compiled with use pyo3::prelude::*;
use pyo3::wrap_pyfunction;
#[pyfunction]
fn do_nothing() {}
#[pymodule]
/// A Python module implemented in Rust.
fn overhead(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(do_nothing, m)?)?;
Ok(())
} |
That's absolutely correct, and just a couple of the optimizations I'd like to add. Unfortunately I can only do so much, and I've been working on other pieces of pyo3 since the last message in this thread. Anyone who's interested in helping implement these optimizations is very welcome to ask me for some pointers on where to get started. |
I should have some time to try soon, if you got the pointers :) |
👍 I'll try to write something useful at the weekend! |
@davidhewitt gentle ping :) |
Absolutely, sorry haven't forgotten about this just had no opportunity to sit down and put my thoughts into something coherent! |
(I'm optimistic I can find some time to do this tomorrow evening!) |
🌍 Environment
rustc --version
): 1.50.0Details
As a rust beginner, I found that using the pyo3 plugins had more overhead when called than the traditional cython approach, this is no good news to make fine-grained embedded development with pyo3.
I asked a question on stackoverflow, a number of people who have followed up on this issue have replicated similar results, but due to the lack of underlying knowledge, we can't explain how this came about. Can I find the answer here? thanks
The question link is https://stackoverflow.com/questions/66467640/why-cython-embeded-plugins-has-higher-performance-in-cpython-interpreter-than-ru
The text was updated successfully, but these errors were encountered: