-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vm: Script importModuleDynamically memory leak #25424
Comments
I think this is because there would be no entry in |
Note that I can fix the issue here as well by switching the WeakMap back into a Map. I have some ideas for refactoring some of this to match the script lifetimes, may see if I can put something together. |
@devsnek ok so it seems like the underlying issue here is that we don't have any idea from v8 when to dispose the import callback function as that would involve knowing when all the microtasks of the top-level execution have run to completion which we can't hook into. This would require new work from v8 to properly handle. In the mean time we need a solution. It seems like our options are:
(2) sounds to me like the best forward-looking solution while the v8 approach is tackled at the same time. I can work on a PR for the above. |
Hmm there's an even bigger memory leak problem here and that is the following:
This then leaves us in the situation where all created vm.Script instances are now leaking. Unless we can get a v8 way to know when all microtasks of a given Script have come to completion, this remains an architectural memory leak in the approach. |
In terms of moving forward here I'd be interested to hear your thoughts @devsnek. One architecture I was thinking of that could avoid the need for a v8 API here could be to move the Then instead of having the Script instance provided as the first argument, instead to provide the script ID as the first argument to the callback, where the script ID is available on the |
@guybedford a handler per context won't allow people to create and use this in the main context, which seems like an annoying limitation. as for keying by property... we need to store the data on the native side (ContextifyScript/ModuleWrap) to keep the api unforgable. Can we try tying the lifetime of the v8::Script and v8::Module to the lifetime of the instances? Idea being that they wouldn't be collected until the v8::Script or v8::Module is collected |
What would the API look like? Storing the function in advance and use it in multiple |
On a second thought....what's the use case of being able to customize node/lib/internal/modules/cjs/loader.js Lines 685 to 688 in eb664c3
Here we use different functions for different scripts but only because we need The leak is caused by us trying to work around |
@joyeecheung for one thing it prevents you from using them in the main context, since the handler for that context would be owned by the internal esm loader. It also means you can't create your own forms of virtual module maps without using new contexts which can be very very unwanted since different contexts have unique primordials. I have a theory that we can fix this by inverting the lifetime relationship of v8::Module and ModuleWrap (and v8::Script and ContextifyScript), but I haven't gotten around to testing it yet. |
Great point, although I'm not sure I follow the logic of why it needs to be a primitive array. Surely HostDefinedOptions needs to be disposed once it becomes unreachable so that those primitives can be disposed? So if this were permitted to be a normal fixed array, surely that could also call the deconstructor for its elements then too? And if so, that would provide what we needed here ideally. @devsnek the ID approach would be for a |
@guybedford i'm not really sure how that approach would fix anything. the apparent issue here is that the referring module is collected, not the module you're returning from the callback. the module you're returning from the callback can't possibly have been collected because you're holding a reference to it so you can return it from the callback. |
This is a memory leak then as we have to hold the module reference for as long as the dynamic callback exists, which currently is indefinitely. |
also just as a simplification, there's no need to use the vm api directly to reproduce this: queueMicrotask(() => import(''));
global.gc(); After my |
As discussed with @devsnek on IRC, we don't currently have a solution to this memory issue without: A. Making all dynamic import callbacks / vm.Script / vm.Module instances Eternal references (memory leak). It does seem like one that goes right down to the base assumptions of how this all plugs together so may need some rethinking. @devsnek given the above I would like to suggest that we go ahead with reverting vm dynamic import support entirely for now. Would you be ok with that? |
@devsnek could we not just have the exact Script instance stored as a JS object in the host_defined_options here as a fix? that way it would be properly managed by the GC and we don't need to have the weak map / other maps as an indirection. Surely v8 can allow us to put non-primitive objects in host_defined_options? |
Just checked, and yes, host_defined_options can be any type of JS reference, so that really does sound like the fix to me. |
it's a PrimitiveArray in v8.h, am I missing something? |
Ahh, sorry for the false alarm, I saw Symbol being used in the test and overlooked it is a primitive. I wonder if it could be extended to be arbitrary data? What would be the constraint there? |
@nodejs/v8 ☝️ |
I just tested this case out again and it seems we have managed to make these references persistent already as they are not being GC'd. Perhaps we should just move over to the persistent form explicitly given this is the current behaviour now. |
@guybedford is this still an issue? |
This is actually a much deeper problem than I originally thought in that V8 itself has no capacity to release module memory without releasing the entire context itself as far as I'm aware. This is likely an area where Node.js (and Deno for that matter) could lead the way in V8 / spec work to improve as Node.js will be blocked directly on issues like nodejs/modules#527 where users expect to be able to unload modules / refresh the registry. It could be as simple as a basic functional |
@guybedford should we rename this issue / change the op to explain what are the problems now? I cannot reproduce the segfault anymore. |
Modules are simply leaky in all JS environments (they are never GC'd until the realm itself is), and everyone is aware of it at this point. So I think we can just close this, I just hope we don't assume this is the way things "have to be" as it is clearly not good, and Node.js has a role to push for better. |
I don’t imagine we can solve this problem and preserve the invariant that import be idempotent, which is important. That being said, with |
While investigating #21573 I was interested in understanding why we are storing the
importModuleDynamically
callback in a weak map, as that surely means that it can be GC'd if there is an indeterminate time before theimport()
function in the module is called.I may or may not have diagnosed the issue correctly, but the test case is a bug nonetheless:
main.js
Where
test.mjs
exists as an ES module.Running with:
node --experimental-modules --expose-gc main.js
I get the following output:If I comment out the
gc()
line or run theimport()
call without a timeout the correct output is provided.This makes me think we should reconsider the use of weak maps as the storage for the dynamic callback, and rather accept that the dynamic callback should always have a lifecycle that is directly tied to the lifecycle of the module(s) it is referenced by.
The text was updated successfully, but these errors were encountered: