-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why object creation is slow in PyO3 and how to enhance it #679
Comments
I feel that perhpas "bounded by GIL" and "not bounded" is a wrong distinction to make and implement, at least at the core construct level. Instead we should just use "owned reference" ( Neither of those would be tied to any particular GIL lock and would require a GIL token to work with. Once we do that, we ought to fix our APIs to properly use these types. For instance, the We could then look for a way to add versions of owned and borrowed references that are bound to a particular instance of a GIL guard, but IMO those should be an add-on rather than a core construct. This, I believe, would remove the book-keeping overhead in object creation and deletion in most, but not all, cases. For example, consider this exceedingly common pattern (in pseudocode): // now
fn myfn(...) {
let my_thing = Py*::new(py); // calls gil::register_owned(...) = overhead!
let return_object = my_think.into_object(); // overhead, increasing refcount on `my_thing`.
drop(gil); // internal overhead, reducing refcount on stuff registered with `gil::register_owned`!
return return_object.into_ptr();
}
// adjusted
fn some_glue(...) -> ... {
let my_thing: Py<Py*> = Py*::new(py); // does not need to call `gil::register_owned`.
let return_object: Py<PyAny> = my_think.into_object(); // effectively a pointer cast
drop(gil); // does not touch my_thing refcount...
return return_object.into_ptr();
} Then
It should be possible to mitigate this overhead here as well by doing something along the lines of: #[thread_local]
static mut WE_HOLD_THE_GIL = false;
impl Drop for Py<T> {
fn drop(&mut self) {
if WE_HOLD_THE_GIL { // super cheap hot branch
decref(self)
} else {
// will be decrefd once pyo3 takes GIL lock.
register_to_decref_on_next_lock(self)
}
}
} |
Somewhat related, something that could inform our design a little: in order to enable implementation of |
I am having trouble totally wrapping my head around how the two object management philosophies mesh, but I thought I would chime in to point out that this is not strictly true: I still don't totally understand the object storage stuff, but it seems like acquiring the gil just around refcount operations (and other operations where we actually perform Python operations that require the GIL) would solve this problem, no? |
Technically it would, yes, but you don’t really want GIL to be acquired and released for each and single local that is being dropped – and there could be quite many of them – inefficiency in doing so is exactly what makes it worth implementing the object storage. Then, again, if we stop putting every single object we create into this storage, then perhaps just taking GIL for the |
Bounded by the current design, I didn't come up with such a design but representing borrowed pointer by
And I like this
Ah, thank you for pointing it out. Maybe my explanation should be refined.
Yeah, just owned or borrowed model can be much clearer. |
I think that
I would think that this is likely to be true. C++'s It's worth remembering that the GIL lock is ref-counted, so even in the example code we've been discussing:
The GIL might well still be held after |
Also, for checking if we have the GIL: there's an API |
A data point for the discussion here: the following code, from a question on Gitter, will cause memory consumption to indefinitely grow when inside the loop. I think it's from pyo3's use of
EDIT: Just noticed #311 has pretty much the same example. |
Here I propose a new idea to remove the internal storage for borrowed objects. pub struct PyAny(ffi::PyObject);
impl PyAny {
fn from_borrowed_ptr(py: Python, ptr: *mut ffi::PyObject) -> &Self {
ptr as _
}
fn from_owned_ptr(py: Python, ptr: ...) -> &Self {
py.register_ptr(ptr);
ptr as _
}
} Pros
ConsApparently, it solves nothing for owned pointers. Still getting many owned objects can cost. impl PyAny {
// or better name
fn as_owned(py: Python, ptr: ...) -> Py<Self> {
...
}
}
|
Or |
It's an interesting idea, we should definitely explore it. I think that most of the return types would have to become I wonder whether we can skip actually having the concrete layout and just have After I realised #883 last night I think getting owned pointers performant and ergonomic is probably more important than borrowed pointers, because I think most return types will have to become owned pointers. |
Yeah, we can skip it... but
Hmm... 🤔 I don't think either owned or borrowed object is important. For a pyfunction, args/kwargs is obtained by #[pyfunction]
fn py_function(arg: Vec<i32>, kwargs: &PyDict) -> PyResult<SomeRustType> {
...
} |
Agreed. What I was thinking is that general users will be making a lot of owned objects from these borrowed pointers, so they should hopefully be easy to work with. |
You mean iterating list/dict or so? |
This is a support issue following #661, where I explain why PyO3’s object creation can be slow and how we can speed up this.
Since I joined this project a year ago, I’m not sure my understanding is correct, especially about the historical aspect of the design.
cc: @ijl @ehiggs
TL; DR
PyObject
andPy<T>
, we cannot remove this overhead.&PyAny, &PyDict
or so, we can remove this overhead by changing them toPyAny<'py>, PyDict<'py>
, and so on.PyObject
with low-costPyAny<'py>
types(and then we should rename PyAny with PyObject).First, let’s revisit our 2 kinds of object
It is a really confusing thing, but we have 2 types for representing Python’s object, named
PyObject
andPyAny
.So, what’s the difference between them?
The answer is PyAny is forced to use as &’py PyAny, which has the same lifetime as GILGuard, but PyObject isn’t.
Thus, this snippet causes a compile error, because
&PyAny
has the same lifetime asGILGuard
.CPython has a reference-count based GC, which really differs from Rust’s lifetime based memory management system. For integrating these two systems, it’s a natural choice to represent Python objects’ lifetime by GILGuard(= our RAII guard that corresponds to Python’s thread lock).
In contrast, PyObject can exist even after
GILGuard
drops.This ‘not bounded’ nature of
PyObject
helps users to write complicated extensions. For example, you can sendPyObject
into another thread, though you cannot sendPyAny
.However,
&PyAny
is a sufficient and reasonable choice for many use cases.For other types than
PyAny
likePyDict
, this ‘not bounded’ types are represented byPy<T>
wrapper type.&PyAny
PyObject
&PyDict
,&PyList
, …Py<PyDict>
,Py<PyList>
, …PyObject retrieval and object storage
Then, how we ensure that
PyObject
can be used afterGILGuard
drops?Let’s think about the this situation.
First, we need to ensure that
PyObject
is not deallocated after firstGILGuard
drops by incrementingthe reference count of the object when creating it.
Then we have to decrement the reference count when it drops, but here comes the problem.
Let’s recall that values are dropped in reverse order of declaraton in Rust, which means
obj
drops aftergil
drops.This is a really problematic thing, because we cannot touch any Python objects when we have no GIL.
To prevent this behavior, we have object storage that stores object pointers.
When
PyObject
drops, we don’t decrement its reference count but store its internal pointer to the storage.Then after we release another
GILGuard
, we decrement reference counts of releasedPyObject
s.Yeah, object storage is a core overhead we discuss in this issue and ideally should be removed.
But for
PyObject
, I have no idea other than using storage to enable this complicated behavior.&’py Lifetime and object storage
For
&PyAny
, the situation is a bit simpler.What we want to do for
&Py~
types is just forcing them to have the same lifetime as GILGuard.So when we create
&Py~
types, we store its internal pointer to the object storage and returns the reference to the pointer in the storage.And when
GILGuard
drops, we decrement the object’s reference count for owned objects and do nothing for borrowed1 objects.To enable this conditional behavior, we have 2 object storages(for owned/borrowed) for
&Py~
types.How we can remove this overhead
So, yeah, for
&Py~
types what we do is just decrementing reference count or doing nothing when it drops.We can do this operation without the internal storage by
Drop::drop
.Then we would have
PyAny<’py>
,PyDict<’py>
, and so on instead of&PyAny
,&PyDict
or so.Thus this would be a really breaking change.
However, since I don’t think our current API that uses reference only for representing lifetime is not reasonable, I’m sure it is worth considering.
The problem is how we distinguish borrowed and owned object without two types of object storage.
A possible idea is the use of a wrapper type, say,
PyOwned<PyAny>
.This design is clear and zero-cost but it requires lots of API changes.
Another possible idea is to use a boolean flag to represent if the object is owned or not.
It doesn’t force users to rewrite lots of codes but needs some runtime costs.
We need discussion and help
We appreciate any idea, discussion, and PRs around this area, especially about zero-cost object types design.
Thanks.
The text was updated successfully, but these errors were encountered: