-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extension Module Handles Are Never Closed #114538
Comments
IIUC, I confirmed that with the legacy |
Thanks for that info! Are the leaks inherent to the memory allocated for the actual DLL or are they memory/resources that the extension allocated and held references. If it's the latter then how is that different from now since extensions never have a chance to clean themselves up? Is it because currently the objects in the extension's C globals are re-used in the next init/fini cycle, and with FWIW, I'm still not convinced that closing extension handles is necessarily worth doing. However, if it means we leak instead of crashing then the balance shifts, at least a little. |
We'd probably also want to track the fact that it's a single-phase init module somewhere. The only significant problem cases are if the init function starts a thread, if it registers a C callback with some lib, or if a reference to one of the extension's process-global objects ends up in the hands of some C library. While each of those cases is unlikely, they are real possibilities and the outcome with each would be a crash, so this solution isn't the best. (There are other risk cases where references to extension-global objects are held somewhere in the CPython runtime, but those cases are more manageable for us--yet still not ideal.) |
FYI, one of my main motivations in this issue is to deal with the case that a single-phase init module is imported in an isolated interpreter, which is ultimately an |
Unloading the handle in that error case might work for that; we haven't ruled it out yet though it's not looking good. One alternative in this narrower case is to always load the init function under the main interpreter first (but not a full import, e.g. do not put it in the main |
Another possible solution in this narrower case is to introduce an extra tag in the extension filename (similar to the SOABI tag) that indicates the module implements multi-phase init. We could check that before we ever even load the extension file. Unfortunately, this would mean stable ABI extensions that implement multi-phase init would have to be recompiled, but only if used in isolated subinterpreters. The approach might need a PEP due to stable ABI extensions and to the impact on packaging tools. IIRC the idea of a new filename tag never came up in the PEP 489 discussions. (It certainly isn't mentioned in the PEP.) Do you remember, @encukou? |
I think so. Experiment (Win)DLL ( #include <malloc.h>
static void *p;
__declspec(dllexport) void __stdcall leak()
{
printf("global ptr: %p\n", p);
size_t size = 1024*1024*100; // 100MB
p = malloc(size);
memset(p, 1, size);
} Script and results: from ctypes import WinDLL
from _ctypes import FreeLibrary
for i in range(2):
mod = WinDLL('foo.dll')
mod.leak()
FreeLibrary(mod._handle)
input() # pause to check the memory usage at this point
I agree that the main/sub interpreters should unload the unofficial (including test) single-phase init modules. If official legacy extensions are only a few, could they be handled individually? |
Beware, there are dragons :)
Another is a new entry point to replace
I don't remember that either. |
We load extension modules by loading the shared library file (
dlopen()
,LoadLibraryExW()
). However we never unload/release it. In each case we get a handle when load it, but we throw the handle away. We should consider holding on to the handle (maybe in something likePyModuleObject.md_handle
).If we save the handle, we could use it to release the loaded shared library (
dlclose()
,FreeLibrary()
).module_dealloc()
would probably be a good place to do so. In every case. releasing it drops an internal refcount, and when it reaches 0 the lib is actually unloaded.Aside from not leaking, there's another real benefit. Extension modules can (and often do, e.g. static types) store objects in C global variables. Single-phase init modules do not have an opportunity to clean up after themselves (which was part of the motivation for PEP 489). So they don't have a chance to decref any objects they may have stashed away in globals. This is problematic when such stashed objects survive runtime init/fini cycles. 1 Closing the lib means all the extensions C globals are discarded and thus importing the module in a later cycle will result in a clean set of globals. That may mean a few objects get leaked, but that's better than segfaulting.
There are some complexities though. Using
dlclose()
, etc. must be done carefully. While not critical, it's best to do it in the thread where the "open" was done. Also, once the lib is closed, trying to use any variables or functions defined in it will segfault. That means you can't do it if there's any thread running C code from the module (incl. daemon threads). Likewise if the module registered a callback somewhere or if a reference to one of its static types (or other global objects) is held outside the module. That may even impact GC.We might consider at least closing the lib if
_PyImport_CheckSubinterpIncompatibleExtensionAllowed()
fails in_PyImport_LoadDynamicModuleWithSpec()
. The risk there is smaller.FWIW, in practice this isn't a big concern because a loaded extension module typically doesn't get destroyed until not long before the process terminates (though it might be a different story for embedders). I'm bringing it up because it something I noticed and looked into and thought we might at least consider the problem.
Footnotes
The same goes for when an extension is loaded for the first time in an isolated (per-interpreter GIL) subinterpreter rather than in the main interpreter, since the C globals will outlive the interpreter. However, there are some caveats with that too. Also, though single-phase init modules are not supported in isolated interpreters, their init functions still get run. ↩
The text was updated successfully, but these errors were encountered: