-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a public C-API function to iterate over GC’able objects #102013
Comments
@swtaarrs who originally implemented this feature in Cinder. |
Is calling
That sounds hard to do, since most Python API is allowed to decref. |
Thanks a lot for the proposal @jbower-fb. I need to think about this in detail but I am worried this may not be the right solution for what you are trying to do here. Seems that the end goal is to provide some interface for the JIT to identify certain objects to do some JIT-related operations, but the GC interface is really an implementation detail here and what you are proposing is to expose a fully public API. This means that we would need to support the API indefinitely. I also share some concerns with this:
because in practice is very hard to ensure that nothing that you call in the callback will create or destroy objects. I suppose for the callbacks you are using this is not hard, but the API you are proposing allows arbitrary callbacks with arbitrary use cases. I think I would be more comfortable with an API more akin to |
With regards to the comment warning against (de)allocations, I copied this verbatim from our Cinder implementation, but on closer inspection I think it's not much of an issue. I think the biggest problem would be dereferencing invalid memory if the current object is deleted. We can avoid this though by tweaking the implementation to inc/dec-ref appropriately around the callback. For allocations, there is some ambiguity whether a new object will be visited or not. We could either leave this as undefined behavior and weaken the comment to note this. Or, I think we only add new objects to the start of the GC list so we could codify the current implementation's behavior which is to never visit newly added objects. Finally, I think we also need to disable GC during the walk. A comment just saying not to explicit run/enable GC doesn't sound nearly as bad as avoiding allocations. Let me know if I missed something. If there is strong opposition to the callback API, I guess a list-based return is okay. It seems very wasteful though as I think many use cases will allocate the list and then immediately throw it away. If someone were using the API to somehow debug memory from C it might not be great to allocate an unbounded
It is technically possible but it's at least very clumsy. We'd need to go through all the motions of importing a module and looking up the function in C. It's also possible for user-code to change override methods on the
Today our current use case we have is iterating over generators, and we're thinking of iterating over functions and maybe types. We could add dedicated APIs for these, but for example for generators I'd be concerned this might be too specific. Also if we didn't use the GC data behind the scenes we'd need to spend extra memory and other overhead tracking this. Overall I think it's a useful general API which can be used to at least get things working today. If we run into limitations with it for certain things, we can add those APIs later without nullifying the value of being able to iterate over GC objects. Someone must already have thought this is a useful feature as it already exists in the Python API.
I was hoping the details of iteration would be the "implementation detail" part and the idea of iterating over the GC'able things would be reasonably stable. We already have this in the Python API, so was hoping it would be okay to expose it to C as well. Having said this, as long as the function is exported for use by modules I'd be fine with this being an unstable |
I was just about to suggest unstable API. This looks like a good fit -- it reaches pretty deeply into the internals, and it if the internals change, we'll want to remove/change this API rather than stretch it to keep backwards compatibility. (And projects that don't want that should be happy with The official unstable C API tier is new in PEP 689 and I'm currently implementing it (well, currently currently blocked by a more urgent work). Still missing docs, but if you send some form of Please use the |
Agreed 👍 I would be much more comfortable starting with this and then bumping it to stable if we find a lot of people are using it.
I do understand it, but please understand our point of view here: we need to keep in mind a lot more than just a good API and optimizations because then many different projects with many different needs would start using it and we need to maintain it forever. We need to be very careful in adding APIs that touch core parts of the VM to the stable tier. Being said that I do not think this is especially dangerous or controversial, but I want to make it clear that is not just about ergonomics and performance.
I get your point, but please understand that's unfortunately not enough to add APIs to the stable tier because of the maintenance cost. If we need to keep adding more and more APIs this makes life much much harder for us because we need to either deprecate or maintain forever the entry points. I am not saying this to oppose the change, I actually think some form of this API is usefull but I want us to be on the same page on how we treat these kind of changes. |
Ah, I wasn't fully aware of PEP 689. The unstable tier sounds perfect for this. I'll update the PR to fit this model and make some tweaks to so (de)allocations aren't an issue. @pablogsal, I completely understand you're point about the high-bar for the stable API due to the ongoing maintenance/compatibility requirements. I was just under the impression we might not allow exporting |
Okay, I've updated my PR to reflect what we discussed. |
I don't think this is a strong reason to implement a C API. Any monkeypatching can break code and that doesn't mean we need to add C API for everything that can be done from Python. |
True, although for the use cases I was considering it would certainly be surprising if this operation suddenly became a lot more expensive/had other side-effects. I'm planning to call it from JIT other low-level features which might have expectations of certain characteristics. Also, I think my point about cumbersomeness also still stands... What is the bar we need to pass to add something new to the unstable C API? So far, I've outlined a couple of use cases which I hope sound reasonable to be done from extensions rather than in Python. |
Merged! Thanks for the feedback everyone. |
* main: (50 commits) pythongh-102674: Remove _specialization_stats from Lib/opcode.py (python#102685) pythongh-102660: Handle m_copy Specially for the sys and builtins Modules (pythongh-102661) pythongh-102354: change python3 to python in docs examples (python#102696) pythongh-81057: Add a CI Check for New Unsupported C Global Variables (pythongh-102506) pythonGH-94851: check unicode consistency of static strings in debug mode (python#102684) pythongh-100315: clarification to `__slots__` docs. (python#102621) pythonGH-100227: cleanup initialization of global interned dict (python#102682) doc: Remove a duplicate 'versionchanged' in library/asyncio-task (pythongh-102677) pythongh-102013: Add PyUnstable_GC_VisitObjects (python#102014) pythonGH-102670: Use sumprod() to simplify, speed up, and improve accuracy of statistics functions (pythonGH-102649) pythongh-102627: Replace address pointing toward malicious web page (python#102630) pythongh-98831: Use DECREF_INPUTS() more (python#102409) pythongh-101659: Avoid Allocation for Shared Exceptions in the _xxsubinterpreters Module (pythongh-102659) pythongh-101524: Fix the ChannelID tp_name (pythongh-102655) pythongh-102069: Fix `__weakref__` descriptor generation for custom dataclasses (python#102075) pythongh-98169 dataclasses.astuple support DefaultDict (python#98170) pythongh-102650: Remove duplicate include directives from multiple source files (python#102651) pythonGH-100987: Don't cache references to the names and consts array in `_PyEval_EvalFrameDefault`. (python#102640) pythongh-87092: refactor assemble() to a number of separate functions, which do not need the compiler struct (python#102562) pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives (python#102631) ...
That is usually true. But for JITs and debuggers specifically, I do think the bar should be lower. Those tools need to work even if a user monkeypatches things. |
…/objimpl.h Include/objimpl.h must only contain the limited C API, whereas PyUnstable_GC_VisitObjects() is excluded from the limited C API.
…pl.h (#115560) Include/objimpl.h must only contain the limited C API, whereas PyUnstable_GC_VisitObjects() is excluded from the limited C API.
…/objimpl.h (python#115560) Include/objimpl.h must only contain the limited C API, whereas PyUnstable_GC_VisitObjects() is excluded from the limited C API.
…/objimpl.h (python#115560) Include/objimpl.h must only contain the limited C API, whereas PyUnstable_GC_VisitObjects() is excluded from the limited C API.
Feature or enhancement
An API similar to:
Which could be used as:
Pitch
We have a version of this in Cinder already and right now and use it to identify all generator objects so they can be de-opted when our JIT is shutdown. In future we plan to use it for things like discovering existing PyFunction objects, and then using gh-91049 to mark them as JIT’able. This could facilitate loading the JIT feature at a later time (e.g. as part of a module).
[Edited] In general, there already exists a Python API for iterating over GC’able objects via gc.get_objects(), however there is no good way of doing this from native extensions. While it is technically possible to import the
gc
module and extract thegc_objects
function in C this is cumbersome, and more importantly might lead to unexpected behavior if Python code has replaced thegc_objects
function.Linked PRs
The text was updated successfully, but these errors were encountered: