-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion/Tracking SnapshotCreator support #13877
Comments
I'm working on first one. |
Does this still really matter now with Ignition (and TurboFan)? |
@mscdex less so, but allows for rudimentary equivalence of features of |
Some numbers from local investigation: https://twitter.com/bradleymeck/status/875758792805404673 On a Mac 64bit:
|
If the |
@refack yes, but the snapshot blob is platform and build specific unlike core dumps |
But we'll end up with a live system that could be dynamically interrogated. Sounds like an amazing feature, and if V8 stabilize the blob format it seems possible they could implement portability. |
Not sure what you mean by "live", all |
Ok not live, suspended animation, but AFAICT you could reignite the event loop and use the preloaded code for example to rerun a JS function to see what it's doing. |
Related: Proposal for snapshots as a Realm API for wider support in the ecosystem and getting libraries to consider the requirements. I built a mode into Prepack to compile snapshots of basic Node CLIs. One limitation that I hit was that things like the type of a socket becomes hard coded. You might imagine that taking a snapshot of a CLI that prints to the console and then running it with |
@sebmarkbage I think the custom serializer deserializer functions in v8 could be enough to handle this. Even with that though, libs could check tty info that gets invalidated when |
Some comments:
(edited by @TimothyGu to correct a typo) |
@bmeck My point was more that the JS libs internal to Node's runtime already does that. Ideally they'd be updated to avoid that problem. |
I wondered about that. Means it's probably not useful for node because typed arrays (Buffers, which are instances of Uint8Array) are used pervasively throughout core. |
Adding support for serializing typed arrays should be fairly easy, as long as the deserializer can allocate the backing stores outside of V8's heap, I think. I'm not familiar with how Buffer is implemented. Would it work out-of-the-box if Uint8Arrays can be serialized? |
@hashseed Yes it would work, but can be a bit unwieldy if no special case is designated. Node.js Buffers can be pooled, and multiple Buffers can share one ArrayBuffer, but with different offsets and lengths. If Buffer serialization were to be implemented, only the visible view should be stored rather than the entire ArrayBuffer. This is what we are doing with the Serializer/Deserializer APIs (see https://github.com/nodejs/node/blob/master/lib/v8.js#L148-L165). Otherwise everything is the same. /cc @addaleax who implemented the Serializer/Deserializer binding. |
@TimothyGu that would require V8's serializer to recognize and introduce special case for Buffers, which doesn't sound like a good idea to me. Why is it necessary to serialize views into separate ArrayBuffers? I would suggest to deserialize Buffers into the exact same state as they've been serialized. |
I am with @hashseed here, I think the entire pool would want to be serialized since a snapshot shouldn't change the data (which is leaking the pool). |
@bmeck @hashseed The default pool size ( I'm not sure how startup snapshots are different from V8's Serializer, and the calculation above only applies to Serializer. If different ArrayBufferViews in a startup snapshot are allowed to share a single ArrayBuffer, then the issue would of course be solved. If not, I do see the argument for keeping the Buffers in the same state (and TBH I'm fine with that), but it just seems to be a bit wasteful. |
I think there is a disconnect here. I am not sure why you need to change the pool for all the buffers, that seems like a shallow copy which would also be bad. |
The start up snapshot is supposed to put the context in a predefined state while bypassing any initialization scripts. It should reach the same state as if the initialization script ran. Assuming that serializing ArrayBuffer is implemented, there is not going to be a duplication you described above. The serializer can recognize objects that it already visited before and deserialize to the same object graph. After deserialization we would still require just 8192 bytes. I think you are confusing the ValueSerializer with the snapshot serializer. The former is fairly new, while the latter has existed for years, and been continuously improved. We are talking about the latter. |
@hashseed Ah that was partially what I was asking – if they operate differently. Seems like they do, so thanks for the explanation :) |
We have been working on snapshot/restore in ChakraCore for a while in the context of time-travel debugging. Our current implementation supports full serialization/deserialization of most ES5/6 constructs to a stable and (mostly) engine agnostic format. Snapshot performance is a big concern for us and the current implementation is able to extract/restore the full JS application state on the order of 10's of milliseconds. We don't yet support linking this up with the native data in Node as, mentioned above, (1) we need to track down and associate all the native reference locations for snapshot/restore and (2) there are challenges in restoring some native state such as file handles or sockets. The first issue is mainly an engineering challenge, an intern of mine experimented with this for migrating live JavaScript/HTML applications (section 5.2). The perf. numbers are on the slow side since it was a proof-of-concept effort. The second issue requires some thought in the API for inflation -- e.g. symbolic values for startup code, user provided hooks, report error on next access and let recovery logic take over, etc. -- and I think having some specific scenarios here to guide design would be very useful. Using snapshots for application startup performance and migration are things I have been wanting to do more work on, and we have most of the needed JavaScript side parts already implemented, so I am definitely happy to support work on this. |
@mrkmarron we can discuss need for ChakraCore here as well. Do you think an intptr_t externals = { READ_FILE }; // provided by node
JSFunction readFileFn = { // made by runtime env / C++
intptr_t internal_fields = { READ_FILE }
};
snapshot_create(...) {
// walk ...
for (field, i of obj.internal_fields) {
serialized.internal_fields[i] = externals.indexOf(field);
}
// ... walk
}
snapshot_load(...) {
// walk ...
for (serialized_external_i, i of obj.internal_fields) {
deserialized.internal_fields[i] = externals[serialized_external_i];
}
// ... walk
} |
Hi @bmeck, if I understand your design correctly I think it should work for recording basic JS->Native references in the snapshot. My paraphrasing of your design is that it uses the index in a fixed by the host array of My only thought here is that this works well when the snapshot is assumed to be at a very stable point (e.g., right after Node loads but before event loop starts) but it can be a bit brittle and difficult to debug if we start to want to use this in more complex scenarios. We had a similar issue with primitive objects from the core runtime (like the undefined object or builtin functions) where we cannot create them and need to give them "well known identities" in the snapshot. We ended up using simple string names, which is slightly lower performance but much easier to debug and more flexible, and I have been satisfied with the result. So, I feel like having a typedef-able setup for the identifier tokens, at least to allow for debugging, might be good. Also, what are you thinking in of the the Native->JS references? We currently just use the pointer values cast to integer values (#defined as I'll plan to put some cycles into the JS->Native implementation next week. |
@mrkmarron I think the map could be fine as long as it doesn't explicitly point to a memory address once serialized. For v8, the general idea is to us Private symbols that are not visible to JS and attach them to the global scope: Context global;
Function require = makeJSRequireFunction();
global->Set(Private::ForAPI("require"), require); // ~= Symbol.for but not visible to JS
// ... take snapshot
// ... revive snapshot
Context global = snapshot.GetContext(0);
Function require = global->Get(Private::ForAPI("require")); This roughly equates to us manually making the map ourselves so that seems fine. Not having to do all the Get/Sets and being able to get an array would be nice. |
Relevant earlier discussion on #9473 .
v8::SnapshotCreator
is a means to capture a heap snapshot of JS, CodeGen, and C++ bindings and revive them w/o performing loading/evaluation steps that got to there. This issue is discussing what would be needed for tools likewebpack
which run many times and have significant startup cost need in order to utilize snapshots.CC: @hashseed @refack
intptr_t
intptr_t
s--make-snapshot
and--from-snapshot
main()
functions for snapshots (save tov8::Private::ForApi($main_symbol_name)
).vm.Context
for snapshotrequire.cache
paths?The v8 API might be able to have some changes made(LANDED)Right now the v8 API would need a
--make-snapshot
CLI flag sincev8::SnapshotCreator
controlsIsolate
creation and node would need to use the created isolate.Since all JS handles need to be closed when creating the snapshot, a
main()
function would need to be declared during snapshot creation after all possible preloading has occurred. The snapshot could then be taken when node exits if exiting normally (note,unref
'd handles may still exist).Some utility like
WarmUpSnapshotDataBlob
from v8 so that the JIT code is warm when loaded off disk also relevant.The text was updated successfully, but these errors were encountered: