Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-34784: Posixmodule: Heap StructSequence #9665

Merged
merged 12 commits into from
Nov 13, 2018
Merged

bpo-34784: Posixmodule: Heap StructSequence #9665

merged 12 commits into from
Nov 13, 2018

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Oct 2, 2018

@eduardo-elizondo eduardo-elizondo changed the title [WIP] Posixmodule: Heap StructSequence bpo-34784: [WIP] Posixmodule: Heap StructSequence Oct 2, 2018
@eduardo-elizondo
Copy link
Contributor Author

Tested this on the gevent (https://github.com/gevent/gevent) third-party dep. This change is backwards-compatible.

Repro steps:

git clone --single-branch -b heapstructseq-posix https://github.com/eduardo-elizondo/cpython
cd cpython
./configure --prefix=$HOME/.local
make -j
make install

git clone https://github.com/pypa/virtualenv
cd virtualenv
~/.local/bin/python3.8 setup.py install

git clone https://github.com/gevent/gevent
cd gevent
virtualenv env -p ~/.local/bin/python3.8
source env/bin/activate
pip install -r dev-requirements.txt
cd src/greentest
python ./testrunner.py

@willingc
Copy link
Contributor

@eduardo-elizondo Thanks for the PR. Is this PR still a WIP? If it is not, please edit the title. Thanks!

@eduardo-elizondo eduardo-elizondo changed the title bpo-34784: [WIP] Posixmodule: Heap StructSequence bpo-34784: Posixmodule: Heap StructSequence Oct 17, 2018
@eduardo-elizondo
Copy link
Contributor Author

@willingc Hi Carol, sorry yes this is ready for review - totally forgot to update the title.

@eduardo-elizondo
Copy link
Contributor Author

Looping in @encukou to look at this changes as well! Feel free to leave any feedback

@nascheme
Copy link
Member

nascheme commented Oct 20, 2018

Hello Eddie,
Thank for you for the patch. First, I think you should separate the changes to structseq.c from posixmodule.c. I.e. use multiple PRs. It makes the changes easier to review. That's assuming that it is not required to change both files at the same time. It doesn't look like it is because your change is backwards compatible. Making smaller more focused PRs is good, IMHO.

I will be adding some inline review comments to this PR. Your approach looks good to me. I.e. fix PyStructSequence_NewType() and then use it so that the types are heap allocated. In your commit message, NEWS entry or maybe in some code comments, I would try to incorporate the information in your post to python-dev, i.e. https://mail.python.org/pipermail/python-dev/2018-September/155069.html. That is good analysis and should get saved somewhere. Probably the commit message and NEWS would be most appropriate.

It looks like there are some issues with memory leaks and ref counting. See inline comments.

The BPO bug has multiple PR linked. I think the older one is obsolete by this one. You should close it in that case.

Copy link
Member

@nascheme nascheme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure PyStructSequence_NewType() leaks memory. It would be useful to write a test in _testcapimodule.c that uses the API multiple times and the decrefs the types returned. Then we can use leak testing tools to ensure we are not losing memory either to refcount bugs or due to missing PyMem_Del() calls.

Objects/structseq.c Outdated Show resolved Hide resolved
Objects/structseq.c Outdated Show resolved Hide resolved
Objects/structseq.c Outdated Show resolved Hide resolved
Objects/structseq.c Outdated Show resolved Hide resolved
Objects/structseq.c Show resolved Hide resolved
@encukou
Copy link
Member

encukou commented Oct 24, 2018

@nascheme

I think you should separate the changes to structseq.c from posixmodule.c. I.e. use multiple PRs. It makes the changes easier to review.
[...]

The BPO bug has multiple PR linked. I think the older one is obsolete by this one. You should close it in that case.

The older PR has the structseq.c changes; this one has the posixmodule.c changes. But one depends on the other, so this PR needs to include the structseq.c changes as well.

To me, it does make sense to review the new implementation together with an example of using it.

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Nov 1, 2018

@encukou @nascheme

I finally updated the change to address all of your comments! I managed to fully clean up all the memory allocations without the need of an extra flag. Here's the rundown of the issues and the fixes:

Leak of PyMemberDef:

StructSequences have to dynamically create the PyMemberDefs that will be passed on to the type's tp_members. PyStructSequence_InitType2 just calls PyMem_New and never cleans up that memory. In my original implementation of PyStructSequence_NewType, I was just replicating the same behavior.

This can be improved by looking at how new types get allocated. If we look into the implementation of type_new, we'll find that tp_members is set by: PyHeapType_GET_MEMBERS. This macro basically looks past the size of the PyHeapTypeObject to pull the PyMemberDefs out. Looking back, we can see that type_new allocates PyHeapTypeObjects by calling (PyTypeObject *)metatype->tp_alloc(metatype, nslots);. This means that it counts the number of PyMemberDefs that the type will need and uses that to allocate extra space past the size of the object. That's how PyHeapType_GET_MEMBERS is able to pull these members out. Finally, that extra space of memory is cleaned up with the whole object, meaning that it doesn't need any extra deallocation step.

Going back to StructSequences, we can easily reuse all of that machinery to clean up the dynamically allocated PyMemberDefs. First, we start by modifying PyType_FromSpecWithBases so that it allocates all the extra space it needs by counting the number of slots that the object will have and passing that to PyType_GenericAlloc. Then, when all the slots are being iterated over, we add a special case for Py_tp_members, just like there is one for Py_tp_doc. Using this, we should be able to copy the dynamically allocated PyMemberDefs into the newly added space of memory at the end of the PyHeapTypeObject. This means that now the type owns it's PyMemberDefs and has no need for the dynamically created ones. Finally, we add a PyMem_FREE after calling PyType_FromSpecWithBases within PyStructSequence_NewType to deallocate the dynamically created PyMemberDefs.

Leak of PyType_Slots:
It turns out that all the slots (modulo Py_tp_doc and Py_tp_members) are static pointers which only get copied over to the appropriate offset in the PyHeapTypeObject. For Py_tp_doc, a new copy is made (as it points to a static literal), and Py_tp_members was addressed above. This means that PyType_Slots is not really needed after running PyType_FromSpecWithBases. Therefore, I moved this into a local variable which just gets cleaned up once it gets out of scope.

Leak of PyType_Spec:
Following the same approach, if we follow each of the uses of PyType_Spec within PyType_FromSpecWithBases we'll find that all the integer values (basicsize, itemsize, and flags) get copied over. slots, as we've already identified, can be easily deleted. Finally, name is mostly being used as a read-only variable during the execution of PyType_FromSpecWithBases (i.e. for error messages). There are only two places where it requires to persist the string. The first, when setting the PyHeapTypeObject's ht_name but that already creates a PyUnicode so it's fine. The second, when setting the type's tp_name. In this case, this only copies the pointer and it works under the condition that the original PyStructSequence_Descr remains alive during the lifetime of the PyHeapTypeObject. Given that all of our PyStructSequence_Descr are static, this should just work. Otherwise, if a use case arrises for a non-static PyStructSequence_Descr, we can just create a copy of the string for the type's tp_name - just like Py_tp_doc. Therefore, I moved this PyType_Spec into a local variable which just gets cleaned up once it gets out of scope.

Testing:
To prove that this doesn't actually leak, I followed Neil's advice and added a new test_capi test. This tracks the refcounts and the allocation blocks to prove that this implementation does not leak.

Lib/test/test_capi.py Outdated Show resolved Hide resolved
Objects/typeobject.c Outdated Show resolved Hide resolved
Objects/structseq.c Outdated Show resolved Hide resolved
@encukou
Copy link
Member

encukou commented Nov 6, 2018

I like the approach!

There's a nice way to resolve the current confusion about what from PyType_Spec is expected to be static and what can be malloc'd temporarily: do a deep copy, which works for both!
But we're not doing a full deep copy. Member defs from Py_tp_members point to names and docstrings, and we still expect these are statically allocated (or at least that they will outlive the class). I don't think it's a problem – dynamic use cases are probably better served by __dict__. It would be good to document the expectation, to avoid confusion in the future.

Objects/typeobject.c Outdated Show resolved Hide resolved
@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Nov 6, 2018

Addressed all of @encukou comments:

  • Added comments about the expectation of what will outlive what in PyStructSequence_NewType.
  • Throw an error in PyStructSequence_NewType2 when calling with a PyTypeObject with a non-zero reference count.
  • Renamed variables and updated all the if and for code blocks to now use braces.
  • Removed the News link to the mailing list post in favor of having it in the bug itself.

@serhiy-storchaka serhiy-storchaka self-requested a review November 7, 2018 05:48
Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I'll merge and watch the buildbots.

@encukou encukou merged commit 474eedf into python:master Nov 13, 2018
miss-islington pushed a commit that referenced this pull request Nov 5, 2019
After #9665, this moves the remaining types in posixmodule to be heap-allocated to make it compatible with PEP384 as well as modifying all the type accessors to fully make the type opaque.

The original PR that got messed up a rebase: #10854. All the issues in that commit have now been addressed since #11661 got committed.

This change also removes any state from the data segment and onto the module state itself.


https://bugs.python.org/issue35381



Automerge-Triggered-By: @encukou
jacobneiltaylor pushed a commit to jacobneiltaylor/cpython that referenced this pull request Dec 5, 2019
After python#9665, this moves the remaining types in posixmodule to be heap-allocated to make it compatible with PEP384 as well as modifying all the type accessors to fully make the type opaque.

The original PR that got messed up a rebase: python#10854. All the issues in that commit have now been addressed since python#11661 got committed.

This change also removes any state from the data segment and onto the module state itself.


https://bugs.python.org/issue35381



Automerge-Triggered-By: @encukou
shihai1991 pushed a commit to shihai1991/cpython that referenced this pull request Jan 31, 2020
After python#9665, this moves the remaining types in posixmodule to be heap-allocated to make it compatible with PEP384 as well as modifying all the type accessors to fully make the type opaque.

The original PR that got messed up a rebase: python#10854. All the issues in that commit have now been addressed since python#11661 got committed.

This change also removes any state from the data segment and onto the module state itself.


https://bugs.python.org/issue35381



Automerge-Triggered-By: @encukou
@ghost
Copy link

ghost commented Apr 4, 2020

This PR was never backported to Python 3.7 and below? I cannot use PyStructSequence_NewType in Python 3.7 and earlier, even latest patch release.

Is this only going to be fixed for Python 3.8 and forwards? My extension works in 3.8+

@ambv
Copy link
Contributor

ambv commented Apr 4, 2020

Correct, looks like this never got backported. 3.6 does not accept bug fixes anymore and 3.7 will only have one last bugfix release before it goes into security fixes only. Seems to me like it's not worth backporting this at this point since you wouldn't be able to use PyStructSequence_NewType in 3.7.0 - 3.7.7 anyway.

@ghost
Copy link

ghost commented Apr 4, 2020

Alright, I can live with that. Good to know, I'll put a 3.8+ note on my project. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants