Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-118362: Fix thread safety around lookups from the type cache in the face of concurrent mutators #118454

Merged
merged 4 commits into from
May 6, 2024

Conversation

DinoV
Copy link
Contributor

@DinoV DinoV commented May 1, 2024

Lookups from the type cache can be exposed to some races. One of those is that _PyType_Lookup isn't increfing the result, so the resulting value can become invalid at any point. This adds a new _PyType_Fetch API which does the incref. Most places are updated to use the new API except for the specializer which isn't safe in free-threaded builds yet.

But there are also issues around when we do a store into the dict and signal that the type is modified. A race can occur where the old value is removed from the dict and before we invalidate the type version. This can be seen today w/ the GIL when the value in the dict has a finalizer - accessing the type cache and the type's mapping proxy return inconsistent results.

This fixes these issues by making the update to the dict and the type modified update an atomic operation. We also capture the previous value in the dictionary and don't free it until after we've made the atomic update. This basically inlines a much simplified version of _PyObject_GenericSetAttrWithDict into type_setattro which can lock mutation against types as well as against the type's dict.

When updating we first clear the type version so concurrent reads of the type cache won't hit. We then replace the existing value and finish the type modified process.

Makes setting an attribute on a class and signaling type modified atomic
Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated
Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions comments below. Biggest questions is that now that we are accessing the dict directly instead of through _PyObject_GenericSetAttrWithDict, how do we know that the other cases handled by _PyObject_GenericSetAttrWithDict are not necessary in _Py_type_getattro?

Also, I think this merits a NEWS entry.

Include/cpython/object.h Outdated Show resolved Hide resolved
Objects/dictobject.c Outdated Show resolved Hide resolved
Objects/dictobject.c Show resolved Hide resolved
Objects/object.c Outdated Show resolved Hide resolved
Objects/typeobject.c Outdated Show resolved Hide resolved
Objects/typeobject.c Show resolved Hide resolved
Objects/typeobject.c Show resolved Hide resolved
Objects/typeobject.c Outdated Show resolved Hide resolved
Objects/typeobject.c Show resolved Hide resolved
Lib/test/test_free_threading/test_type.py Show resolved Hide resolved
@DinoV
Copy link
Contributor Author

DinoV commented May 1, 2024

Some questions comments below. Biggest questions is that now that we are accessing the dict directly instead of through _PyObject_GenericSetAttrWithDict, how do we know that the other cases handled by _PyObject_GenericSetAttrWithDict are not necessary in _Py_type_getattro?

This is part of the reason why I assert:

    assert(!_PyType_HasFeature(metatype, Py_TPFLAGS_INLINE_VALUES));
    assert(!_PyType_HasFeature(metatype, Py_TPFLAGS_MANAGED_DICT));

In type_setattro. Type objects will always have a dictptr, so we'll either go to _PyObjectDict_SetItem if the dict doesn't exist yet or do the get/set if they do. _PyObjectDict_SetItem is just going to do a get/set as well. And because types don't have a managed dict they don't have shared keys.

Also, I think this merits a NEWS entry.

@DinoV DinoV marked this pull request as ready for review May 1, 2024 20:39
Fix some formatting
Add news blurb
Use PyDictObject * more
Rename _PyDict_GetItemRef_LockHeld to _PyDict_GetItemRef_Unicode_LockHeld
Reduce iterations in test
Expose _PyObject_SetAttributeErrorContext for attaching context
@DinoV
Copy link
Contributor Author

DinoV commented May 1, 2024

Also, I think this merits a NEWS entry.
I opened #118492 to track the issue that is fixed when the GIL is present and blurb'd it.

@DinoV DinoV force-pushed the nogil_type_crash branch from 705538b to 8074d0d Compare May 1, 2024 21:23
@colesbury colesbury changed the title [gh-118362] Fix thread safety around lookups from the type cache in the face of concurrent mutators gh-118362: Fix thread safety around lookups from the type cache in the face of concurrent mutators May 2, 2024
@@ -658,6 +658,7 @@ extern PyObject *_PyType_NewManagedObject(PyTypeObject *type);
extern PyTypeObject* _PyType_CalculateMetaclass(PyTypeObject *, PyObject *);
extern PyObject* _PyType_GetDocFromInternalDoc(const char *, const char *);
extern PyObject* _PyType_GetTextSignatureFromInternalDoc(const char *, const char *, int);
extern int _PyObject_SetAttributeErrorContext(PyObject* v, PyObject* name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
extern int _PyObject_SetAttributeErrorContext(PyObject* v, PyObject* name);
extern int _PyObject_SetAttributeErrorContext(PyObject *v, PyObject *name);

@@ -4981,7 +5026,10 @@ find_name_in_mro(PyTypeObject *type, PyObject *name, int *error)
PyObject *base = PyTuple_GET_ITEM(mro, i);
PyObject *dict = lookup_tp_dict(_PyType_CAST(base));
assert(dict && PyDict_Check(dict));
res = _PyDict_GetItem_KnownHash(dict, name, hash);
if (_PyDict_GetItemRef_KnownHash((PyDictObject *)dict, name, hash, &res) < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PyErr_Occurred() check below can be removed now.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few minor issues:

  • We need to handle the PyDict_New() returning NULL case in type_setattro
  • Minor formatting in _PyObject_SetAttributeErrorContext
  • PyErr_Occurred() case is no longer need in find_name_in_mro

@rhettinger rhettinger removed their request for review May 3, 2024 23:18
@DinoV DinoV merged commit 5a1618a into python:main May 6, 2024
36 checks passed
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot iOS ARM64 Simulator 3.x has failed when building commit 5a1618a.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/all/#builders/1380/builds/238) and take a look at the build logs.
  4. Check if the failure is related to this commit (5a1618a) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/all/#builders/1380/builds/238

Failed tests:

  • test_free_threading

Failed subtests:

  • test_attr_cache - test.test_free_threading.test_type.TestType.test_attr_cache

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/test/test_free_threading/test_type.py", line 37, in test_attr_cache
    with Pool(NTHREADS) as pool:
         ~~~~^^^^^^^^^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/dummy/__init__.py", line 124, in Pool
    return ThreadPool(processes, initializer, initargs)
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/pool.py", line 930, in __init__
    Pool.__init__(self, processes, initializer, initargs)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/pool.py", line 196, in __init__
    self._change_notifier = self._ctx.SimpleQueue()
                            ~~~~~~~~~~~~~~~~~~~~~^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/context.py", line 113, in SimpleQueue
    return SimpleQueue(ctx=self.get_context())
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/queues.py", line 361, in __init__
    self._rlock = ctx.Lock()
                  ~~~~~~~~^^
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/context.py", line 67, in Lock
    from .synchronize import Lock
  File "/Users/buildbot/Library/Developer/XCTestDevices/2E72F7FC-E4ED-40AC-B022-F9C50E4B0A1E/data/Containers/Bundle/Application/4CB60694-FBAA-49BE-969E-073C108C659C/iOSTestbed.app/python/lib/python3.13/multiprocessing/synchronize.py", line 17, in <module>
    import _multiprocessing
ModuleNotFoundError: No module named '_multiprocessing'

class A:
attr = 1

@unittest.skipIf(is_wasi, "WASI has no threads.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, probably better to use @threading_helper.requires_working_threading()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iOS and Android do both have working threading, but they don't support subprocesses. In theory multiprocessing.dummy.Pool should work on these platforms, but in practice it doesn't because it imports too much of the main multiprocessing module.

The simplest solution is probably to use concurrent.futures.ThreadPoolExecutor instead.

SonicField pushed a commit to SonicField/cpython that referenced this pull request May 8, 2024
… in the face of concurrent mutators (python#118454)

Add _PyType_LookupRef and use incref before setting attribute on type
Makes setting an attribute on a class and signaling type modified atomic
Avoid adding re-entrancy exposing the type cache in an inconsistent state by decrefing after type is updated
@DinoV DinoV deleted the nogil_type_crash branch May 31, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants