Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_EnableDeferredRefcount) #42

Closed
colesbury opened this issue Sep 18, 2024 · 37 comments
Labels

Comments

@colesbury
Copy link

In the free-threaded build, marking an object as supporting deferred reference counting allows the interpreter to skip reference count modifications for values pushed and popped from the interpreter's evaluation stack and local variables. This is important for avoiding contention on reference count modifications that inhibit scaling. See also PEP 703: "Deferred Reference Counting". The tradeoff is that these objects are only collected when the tracing GC runs.

In the free-threaded build, we currently have an internal-only function _PyObject_SetDeferredRefcount that allows an object to have deferred references to it. We use this on methods, top-level functions, modules, heap types, and a few other types of objects.

I think we should expose this as an unstable API for C extensions that "hints" the runtime to make this tradeoff:

By "hint", I mean that the runtime is free to ignore this suggestion. In particular, the the function is a no-op in the default build.

Motivation

Some C API extensions like nanobind create their own function and method-like objects. The nanobind.nb_func and nanobind.nb_method behave like functions and methods, but they are not subclasses of PyFunction_Type, PyCFunction_Type, or PyMethod_Type.

Without deferred reference counting (or immortalization) enabled, calling nanobind functions from multiple threads doesn't scale at all -- nanobind would not be very useful in the free-threaded build.

I think this is likely to come up in other extensions as well -- Peter's PR was created in response to a separate issue by @Yiling-J, for example.

Alternatives

Nanobind's support for free-threading currently relies on making nb_func and nb_method objects immortal in the free-threaded build immortal by basically copy-pasting _Py_SetImmortal. That's obviously not great - I'd much rather that nanobind is able to use an official, unstable API than to modify internals in a way that's likely to break in new CPython versions.

Note that in 3.13 deferred reference counting and immortalization are entangled due to time limitations, but this is no longer the case in main/3.14.

Python API?

I don't think we should expose this as a Python API, at least for now.

cc @wjakob, @ngoldbaum, @ZeroIntensity

@Yiling-J
Copy link

My motivation is to make my cache package scalable, but it's still unclear to me how method is expected to perform under free threading. Does this mean cache.get(key) is contended by default, and that I need to use the new C API to ensure scalability? I’ve already shared my concerns in the issue, and let me paste them here:

Python users might not realize the potential bottleneck when using free-threading Python. If they don't consult the C API documentation, they may be unaware that methods could become a scalability issue and need to be manually optimized using the C API.

@ZeroIntensity
Copy link

FWIW, your case is somewhat specific. A bound method object (client.get) is not something that's generally accessed across threads, so there's no DRC on them by default.

@encukou
Copy link
Collaborator

encukou commented Sep 19, 2024

I suggest PyUnstable_Object_EnableDeferredRefcount; it's not setting the refcount number.
(For a more stable function I'd suggest naming it after the intent rather than implementation, like _HintShared, but that's not too important at this point.)


I think it should be easier to add/remove reasons why deferred refcounting can't be enabled.
Even for PyUnstable_ functions, it's nice if they survive intact for a few releases :)

Suggestion:

  • return 0 if successful (breaking convention; usually this is 1)
  • return a positive value if nothing happened (GIL was enabled, object not owned by this thread, etc.). The specific reasons & values can change even in patch releases, and are only documented in the source.
  • reserve returning -1 with exception set for other errors

Keep in mind PyUnstable_ functions need docs and tests :)

@vstinner
Copy link

I think that it's reasonable addition, but I'm not sure about the API. I'm waiting to see a concrete API :-) I don't understand well when such call can fail.

@colesbury
Copy link
Author

@encukou:

  • Yeah, I agree that PyUnstable_Object_EnableDeferredRefcount is a better name
  • I agree about different return codes for success vs. no-op, but why use 0 for success instead of 1? I'd expect 0 if nothing happened, such as in the default build, and 1 for success. I don't have a super strong preference here, but I also don't see a reason for breaking with convention.

@vstinner: the linked PR (python/cpython#123635) has a concrete API. The API only fails if the pre-conditions are not met:

  1. Is the object tracked by the GC? The deferred reference counting implementation relies on the GC being able to find all deferred references, so it's important for correctness that the object is tracked by the GC.
  2. Is it it thread-safe to modify the reference count? In the PR, this amounts to a check that the object is owned by the current thread and that the shared reference count is zero. That's basically a heuristic. As I'm writing this, I think that maybe we should instead try to make the API thread-safe in more cases by using atomic operations instead of trying to enforce this precondition.

@encukou:

  • I think "object not tracked by the GC" is better treated as an error, rather than as "nothing happened" because it's an indication of incorrect usage.

@encukou
Copy link
Collaborator

encukou commented Sep 23, 2024

I agree about different return codes for success vs. no-op, but why use 0 for success instead of 1?

Because there's only one zero, but a lot of positive values :)

You could also do it with a pointer-to-result argument: int PyUnstable_Object_EnableDeferredRefcount(PyObject *obj, int *failure_reason) would return 0 if nothing hapened, and store the reason in that case.

I think "object not tracked by the GC" is better treated as an error, rather than as "nothing happened" because it's an indication of incorrect usage.

IMO, that's a detail handled by whoever defines the type. For whoever uses the object, it should be OK to ignore the detail -- the indication that nothing happened should be enough.

@vstinner
Copy link

Why not setting an exception with an error message, rather than an error code?

@ZeroIntensity
Copy link

Bikeshedding: I like HintDeferredRefcount more if we're going to emphasize "nothing happened" return values.

failure_reason is interesting, but how will the user convert that into something human readable? Something along the lines of PyUnstable_Object_EnableDeferredRefcountStrError?

For whoever uses the object, it should be OK to ignore the detail -- the indication that nothing happened should be enough.

I think we're almost certainly going to get users that don't understand how DRC works, and only know that it helps speed up threads, so they'll just apply it to every object they see. If trying to enable it on a very clearly incorrect object doesn't raise an exception, we're bound to have users putting no-ops everywhere -- wouldn't it be better to prevent that?

@encukou
Copy link
Collaborator

encukou commented Sep 23, 2024

Why not setting an exception with an error message, rather than an error code?

Because

  • allocating an exception is a rather heavy operation
  • the failure reason is only really useful for debugging; to handle it in production you should always either show a warning or do nothing
  • since you shouldn't fail, the failure definitely shouldn't bubble up to the caller

I think we're almost certainly going to get users that don't understand how DRC works, and only know that it helps speed up threads, so they'll just apply it to every object they see. If trying to enable it on a very clearly incorrect object doesn't raise an exception, we're bound to have users putting no-ops everywhere -- wouldn't it be better to prevent that?

I wouldn't worry about users doing misguided PyUnstable calls, especially if there's a clear “this was a no-op” warning indicator to check.
I'm more worried about some extension starting to fail because some value that used to be GC-tracked changed to un-tracked -- either because the type itself changed, or a new type shows up in some use cases.

@ZeroIntensity
Copy link

At this point, do we even need three different return codes? If we go with failure_reason, then there's nothing that can raise an exception. We could just use -1 for "nothing happened" and then 0 for success.

@vstinner
Copy link

I see 3 cases:

  • Return 1: the object now uses deferred ref counting
  • Return 0: the object was already using deferred ref counting
  • Return -1: error, the object cannot use deferred ref counting -- see failure_reason for details

@ZeroIntensity
Copy link

Return 0: the object was already using deferred ref counting

This could be a failure_reason as well, but while we're here: do we need a PyUnstable equivalent of _PyObject_HasDeferredRefcount? (@colesbury)

@colesbury colesbury changed the title Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_SetDeferredRefcount) Add unstable C API function for enabling deferred reference counting (PyUnstable_Object_EnableDeferredRefcount) Sep 24, 2024
@colesbury
Copy link
Author

The int *failure_reason is not a common pattern in CPython. It's far more common to set an exception and return -1. I don't see a strong enough reason to deviate from that pattern.

I expect extensions to use this on instances of their own types during object construction. That's how we use it internally too and I think we should be explicit about that in the documentation. For what it's worth, I do not expect this to be widely used, but it is important for the few extensions that will make use of it.

I see the three cases as:

  • Return 1: the object now uses deferred ref counting
  • Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)
  • Return -1: error, the call was made at the wrong place or on the wrong type of object. For example, the object is not GC tracked.

I think the -1 cases are analogous to PyErr_BadInternalCall(), but we can do a bit better with the error messages than that. Bubbling the error up to Python isn't perfect -- I agree that you should never see these in production -- but this seems to be the most common way to signify misuse of a Python C API.

I don't think we should expose _PyObject_HasDeferredRefcount yet.

@vstinner
Copy link

vstinner commented Oct 7, 2024

It seems like we have an agreement on the API, so let's vote.

Vote:

API: int PyUnstable_Object_EnableDeferredRefcount(PyObject *obj)

  • Return 1: the object now uses deferred ref counting
    Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)
  • Return -1: error, set an exception, the call was made at the wrong place or on the wrong type of object. For example, the object is not GC tracked.

@encukou
Copy link
Collaborator

encukou commented Oct 7, 2024

+0 for this as unstable API, with the mentioned docs note:

I expect extensions to use this on instances of their own types during object construction. That's how we use it internally too and I think we should be explicit about that in the documentation.

(Without this, an extension that would change one of its types from GC-tracked to untracked would be making a backwards incompatible change.)

@markshannon
Copy link

Return 1: the object now uses deferred ref counting
Return 0: the interpreter ignored the hint (e.g., we are in a GIL-enabled build, the object already uses deferred ref counting, or other unspecified cases)

Does this mean it is permitted for an implementation to always return 0, or is it obliged to check if the object was already using deferred reference counting and return 1 if it wasn't?

(We plan to use deferred reference counting for the default build)

@markshannon
Copy link

markshannon commented Oct 8, 2024

Is an implementation allowed to return 1 if an object was already using deferred reference counting?

The specification is vague.

@markshannon
Copy link

I think this would be better:

  • Return 1: the object now uses deferred ref counting
  • Return 0: the object does not use deferred ref counting (e.g. that object cannot support deferred ref counting, or the implementation does not support deferred ref counting)

I don't see a need for this function to return -1 and raise an exception.

If calling PyUnstable_Object_EnableDeferredRefcount is supposed to be just a hint, then we can ignore the hint.
There is no need to raise an exception.

@vstinner
Copy link

@erlend-aasland @mdboom @serhiy-storchaka: You didn't vote on this API yet. What's your call on it?

@vstinner
Copy link

@markshannon:

I don't see a need for this function to return -1 and raise an exception.

The current implementation fails if PyType_IS_GC() is false: if the object is not traked by the GC.

In general, it's better to have an error path.

@markshannon
Copy link

The current implementation fails if PyType_IS_GC() is false: if the object is not tracked by the GC.

So? _PyObject_SetDeferredRefcount can return 0 if it can't defer an object.

In general, it's better to have an error path.

Why?

This supposed point of this function is to hint that reclamation of an object can be deferred. There is no reason for setting an exception, it just makes the function harder to use.

@erlend-aasland
Copy link

@erlend-aasland @mdboom @serhiy-storchaka: You didn't vote on this API yet. What's your call on it?

I removed my vote since there is ongoing discussion.

@vstinner
Copy link

@colesbury @ZeroIntensity: What's your opinion on the "error case"? Should the function return -1 and set an exception, or return 0 and "ignore the error"?

@ZeroIntensity
Copy link

As a user, exception messages are helpful when debugging--I'm all too familiar with digging through source to find why something is failing. No-ops, to me, are only useful if there's absolutely nothing I can do about it, such as on a GIL-enabled build (rather than the other case of trying to use it on an untracked type, in which case I should just remove the call entirely).

If calling PyUnstable_Object_EnableDeferredRefcount is supposed to be just a hint, then we can ignore the hint.

PyUnstable_Object_EnableDeferredRefcount is a hint, and may no-op depending on the state of the interpreter, but it should validate its arguments. (It's also good to have an error path in case there's a need for allocations in the future, so the MemoryError can propagate.)

@colesbury
Copy link
Author

I had a slight preference for -1 for certain unsupported cases, like calling it on a non-GC type. If you're doing that, you are almost certainly doing something wrong, but I don't think it matters very much. I am also okay with @markshannon's suggestion.

@markshannon
Copy link

Why is calling PyUnstable_Object_EnableDeferredRefcount on a non-GC type doing something wrong?

PyUnstable_Object_EnableDeferredRefcount(obj) is a hint to the the interpreter that a particular object is likely to be long lived.
It has nothing to do with the internals of the current implementation of the cycle GC.

@colesbury
Copy link
Author

It's a hint to use deferred reference counting and we don't support that on non-GC types.

@markshannon
Copy link

It's a hint to use deferred reference counting and we don't support that on non-GC types.

There are a couple of problems with that:

  • Unless you know the internals of the cycle GC, you can't tell what that does in practical terms
  • The user has to know which objects are "GC types"

What a user should know, and doesn't depend on the implementation, is whether an object is likely to be long-lived and whether it is OK if its reclamation is deferred.
How about changing the name to PyUnstable_Object_AllowedDeferredReclamation?

Or, if we don't want that yet, keep _PyObject_SetDeferredRefcount private?

@ZeroIntensity
Copy link

The user has to know which objects are "GC types"

This still applies to the no-op case. Arguably, that makes it more difficult for users to figure out why the function isn't working. Is it possible for this function to work on non-GC types in the future?

@markshannon
Copy link

Arguably, that makes it more difficult for users to figure out why the function isn't working.

Just because the VM doesn't do anything with that hint doesn't mean the function has failed. It always works.

@colesbury
Copy link
Author

In the interest of moving this along, let's remove the -1 return code from the proposal.

How about changing the name to PyUnstable_Object_AllowedDeferredReclamation

No - we have multiple mechanisms that defer reclamation and this is one specific mechanism.

Or, if we don't want that yet, keep _PyObject_SetDeferredRefcount private?

This is addressed in the "Motivations" and "Alternatives". I don't think that's a good idea.

@ZeroIntensity
Copy link

Bikeshedding: HintDeferredRefcount makes it more clear that it's a hint--calling something named Enable that possibly doesn't enable anything seems counterintuitive.

@markshannon
Copy link

No - we have multiple mechanisms that defer reclamation and this is one specific mechanism.

Would it be better to have an API that says you are OK with deferred reclamation and let the VM decide which mechanism to use?

@colesbury
Copy link
Author

Some of the deferred reclamation mechanisms are primarily about thread-safety 1 as opposed to avoiding reference count contention. I don't think the VM has enough information to decide. I would rather have this API concretely do one thing.

Footnotes

  1. For example, at one point we had _PyGC_BITS_SHARED_INLINE, which delayed tp_dealloc invocation so that we could make some racing reads and writes safe, but it didn't do anything for addressing reference count contention. It also turned out to have problems and got removed, but we may end up using something similar in the future.

@ZeroIntensity
Copy link

I've given in and updated the PR to no-op on non-GC types. Are there any other concerns about the API?

@encukou
Copy link
Collaborator

encukou commented Nov 4, 2024

With Mark being strongly for the no-fail API, and Sam not opposed to it, I think it's good to go.

(FWIW, I think the discussion was a bit too thorough, and unstable API shouldn't need C API WG approval at all. But, you do get better API by asking -- or at least better-considered API.)

@colesbury
Copy link
Author

This is merged in main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants