Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization and persistent identity #58

Open
Jamesernator opened this issue Oct 31, 2021 · 27 comments
Open

Deserialization and persistent identity #58

Jamesernator opened this issue Oct 31, 2021 · 27 comments

Comments

@Jamesernator
Copy link

Jamesernator commented Oct 31, 2021

As specified each evaluation of a module block expression produces a new module block object. We also have the ability to serialize them and send them elsewhere.

However what is not obvious specced or clarified is whether or not these "clones" result in the same module record. As an example suppose we sent the same module many times to a worker, would the resulting import produce just a single module record?

I feel like the motivations of the proposal would suggest it should, this would ensure we can send a module to a worker multiple times but without reinstantiating it again and again (and instead just using the cached module namespace object).

As a code example demonstrating the above situation, should this print "Instantiated!" once or 10 times? (using structuredClone/import() on same thread instead of Worker, but same logic applies)

// There is a single module object
const mod = module {
    console.log("Instantiated!");
    
    export default {};
}

const instance = await import(mod);

for (let i = 0; i < 9; i++) {
    // However it can be cloned multiple times
    const modClone = structuredClone(mod);
    // But should this produce the same module record?
    const cloneInstance = await import(modClone);
    // If so this will be true
    console.log(instance === cloneInstance && instance.default === cloneInstance.default);
}
@mhofman
Copy link
Member

mhofman commented Oct 31, 2021

That's a very good point, which makes me think we have to be careful regarding garbage collection.

I believe the intent is to only cause a single instantiation for a given static record. Even if we make sure to create a new object representation for each static record when shared between agents / structured clone, the same module namespace object could be obtained from it.

Someone could technically use that namespace object as a weak target. Unless we say that only their observation in the local agent matters for their liveness definition, we risk entangling independent object graphs.

@surma
Copy link
Member

surma commented Nov 2, 2021

I would expect 10 separate instantiations. My mental modal is to have module blocks behave just like object literals. Clonding the same object 10 times will give you 10 unique copies of that object.

But I’m curious to hear more use-case driven arguments here.

@kriskowal
Copy link
Member

I also expect 10 separate module initializations. I would furthermore expect 10 separate module initializations even without the clone. If mod corresponds to a static module record, I expect import(mod) and compartment.import(mod) to both work. A static module record can be initialized in different compartments with different module specifiers, and if different compartments have different “import maps”, the linkage of import specifiers may be different. Because the static module record doesn’t capture or communicate a unique module specifier for the module, there is no way for compartment.import(mod) to communicate a cache key for the module namespace instance. Therefore, these instances must be ephemeral. Therefore, each import(mod) must produce a new initialized module namespace object.

@mhofman
Copy link
Member

mhofman commented Nov 2, 2021

Just to clarify, what would the behavior be for the following modified example be:

// There is a single module object
const mod = module {
    console.log("Instantiated!");
    
    export default {};
}

// Logging "Instantiated!"
const instance = await import(mod);

for (let i = 0; i < 9; i++) {
    // Not cloning, loading the same module definition
    // I'd expect no further logging
    const cloneInstance = await import(mod);
    // I'd expect the following to be true
    console.log(instance === cloneInstance && instance.default === cloneInstance.default);
}

@surma
Copy link
Member

surma commented Nov 2, 2021

@mhofman Yep, I agree with your expectations in that code sample.. Importing the same module block should return the same module instance.

@Jamesernator
Copy link
Author

I would expect 10 separate instantiations. My mental modal is to have module blocks behave just like object literals. Clonding the same object 10 times will give you 10 unique copies of that object.
But I’m curious to hear more use-case driven arguments here.

It comes down to patterns for using these things in workers, because sending to a worker corresponds to a structuredClone. This leads to a weird thing were import() is idempotent on the same thread, but if we send it to another thread it could be evaluated over-and-over-and-over again.

i.e. As an example suppose we had some API that used a worker in the background for loading modules like this:

class Scene {
   async applyEffect(effectModule) {
       // send to worker to apply the effect
   }
}

And then we had a constant set of effects:

const effects = {
    balloons: module {
        export function renderEffect() {
        
        }
    },
    discoLights: module {
        export function renderEffect() {
        
        }
    },
}

If we were to do something like:

// When a dropdown changes the effect
effectSelectElement.addEventListener("change", async (event) => {
    const effectName = event.target.value;
    // Apply the effect
    await scene.applyEffect(effects[effectName]);
});

Then even applying the same effect multiple times results in multiple executions. However if we were to implement this same thread:

// When a dropdown changes the effect
effectSelectElement.addEventListener("change", async (event) => {
    const effectName = event.target.value;
    const effect = await import(effects[effectName]);
    // Apply the effect somehow
});

Then the evaluations of the modules would be idempotent if the same module has already been evaluated.

In my opinion this is fairly confusing, as even though the set of effects is constant and never changes, multiple evaluations of the "same module" on one thread is different to multiple evaluations of the "same module" on another thread.

And yes I get what you're saying is that the mental model should perhaps be that sending to a worker corresponds to a clone, but I don't really feel like this is the model I would've expected based on how import() works. i.e. If I send a string specifier to a worker { moduleUrl: "https://some.mod/url" } then repeated imports of that url will be idempotent. But using this module blocks feature despite being idempotent on the same-thread, is no longer idempotent cross-thread, this is quite a divergence from string specifiers.

This difference between string-specifiers and module-blocks leads to a situation where it is hard to design APIs that can take in modules and send them to workers without accidental reevaluation.

And perhaps this is a non-issue and engines will be extremely efficient at reevaluating the same module multiple times in a worker, although I am skeptical this is great to rely on in the presence of top-level-await where arbitrarily complex work might be involved to import a module.

@mhofman
Copy link
Member

mhofman commented Nov 2, 2021

The mental model I have is of the SharedArrayBuffer, which has a unique backing data block, but may have multiple object instances connected to it, possibly in different agents. The difference with SAB is that the backing data block is not reified as an object, where here we have module namespace objects.

I agree with @Jamesernator that the developer ergonomics would be better if a single instantiation occurred for a given module declaration, regardless of the identity of the JavaScript object that represents the module declaration in code. However as I also mentioned, we have to be careful that this doesn't introduce a shared identity that is observable across multiple agents, as that'd be a sure way to get into complex distributed garbage collection problems.

PS: I'm careful to differentiate realm and agents here, where the property of an agent is that it has it's own completely independent object graph with its own local gc. None of these problems exist across realms in the same agent as those objects graphs are already entangled, even with ShadowRealm.

@mhofman
Copy link
Member

mhofman commented Nov 17, 2021

I gave this issue more thought, and we should be fine as long as the program cannot observe the liveness of a module block shared with another agent.

This implies either of the following requirements must be enforced:

  • The identity of the module block value is not maintained through structure cloning, or
  • The liveness of the identity of the module block cannot be observed.

Let's assume we want to enable the developer ergonomics of sharing with another agent the same module block multiple times through postMessage, and that it should not result in multiple instantiation when imported.

The first requirement may be surprising to users, as the answer from @surma and @kriskowal seem to indicate. It may not be obvious that a same underlying static module record can have multiple representatives with different identities.

If instead we want a stable identity through structure clone, the second requirement would require that the module block liveness cannot be observed through a WeakRef. However any object should be accepted as a WeakRef target (cannot throw). Either we would have to consider the module block object an uncollectible root in the object graph, or we could make the module block value a primitive, which doesn't carry the same expectations.

There is currently a precedent in the spec for a primitive with unforgeable identity and unobservable liveness: Symbols. Those however cannot be used through structured cloning, I assume because of their unforgeable identity. I believe however it could be possible to roundtrip an identity bearing primitive to another agent and back, by using a WeakValueMap style association, e.g. using a UUID implicitly created when declaring the module block. As long as the primitive value cannot be used as a WeakRef target, there should be no problems of distributed garbage collection.

Credits to @erights for the idea of module blocks as primitive values.

@Jamesernator
Copy link
Author

Jamesernator commented Nov 18, 2021

If instead we want a stable identity through structure clone, the second requirement would require that the module block liveness cannot be observed through a WeakRef. However any object should be accepted as a WeakRef target (cannot throw). Either we would have to consider the module block object an uncollectible root in the object graph, or we could make the module block value a primitive, which doesn't carry the same expectations.

There is currently a precedent in the spec for a primitive with unforgeable identity and unobservable liveness: Symbols. Those however cannot be used through structured cloning, I assume because of their unforgeable identity. I believe however it could be possible to roundtrip an identity bearing primitive to another agent and back, by using a WeakValueMap style association, e.g. using a UUID implicitly created when declaring the module block. As long as the primitive value cannot be used as a WeakRef target, there should be no problems of distributed garbage collection.

So one of the pain points with symbols that is currently a stage-2 proposal to be changed is precisely the fact you cannot observe their liveness despite being unforgeable. That proposal seeks to allow symbols as entries in WeakMap, and targets of WeakRef/FinalizationRegistry.

Now because module blocks are serializable, some of the use cases for observable liveness become less of an issue (i.e. remote objects). However I feel like associating metadata with module blocks is likely to be fairly common, and given these blocks can be generated arbitrarily many times like symbols, it would be unfortunate if there were no way to associate such data without preventing garbage collection. (i.e. As primitives, they would only be storable in a Map making garbage collection of associated data difficult to impossible in general).

Although I do agree with concerns about observing liveness, but this makes me wonder, even if we did make module blocks primitive it would be useful to allow them as WeakMap keys, but NOT WeakRef/FinalizationRegistry targets. This would allow people to associate metadata with module blocks, but without causing memory leaks, and without creating weak targets (as from what I understand WeakMap doesn't enable weak targets, only WeakRef/FinalizationRegistry do).

@ljharb
Copy link
Member

ljharb commented Nov 18, 2021

I don’t think it would be good to violate the axiom that a WeakMap key, WeakSet member, and WeakRef target can all be the same kind of value.

@mhofman
Copy link
Member

mhofman commented Nov 18, 2021

I don’t think it would be good to violate the axiom that a WeakMap key, WeakSet member, and WeakRef target can all be the same kind of value.

I agree, and regardless allowing usage as WeakMap keys is sufficient to cause the problem in the presence of WeakRef in the realm. Aka you can add the module as key, and a plain sentinel object as value, and simply observe the sentinel. So my earlier distinction was pointless.

I don't believe there is a way to reconcile the ability to associate metadata with a module block, and block cross agent liveness observability. We have to either:

  • Have a stable identity between agents, but be unable to observe its liveness. That means no usage as WeakMap key or WeakRef target, which basically means a primitive.
  • Not have a stable identity through cloning. You could still have a local object representation as the value, and use it as WeakMap key or WeakRef target, but everytime you send a module block through structure cloning, you'd get a different object, so that realistically limits your ability to associate data to the realm in which the module was initially defined.

The primitive approach allows you to work across the ShadowRealm callable boundary without any more spec features (e.g. a cloner for module blocks that supports shadow realms).

The last alternative is to give up on preventing shared cross agent identity, at least on the agent cluster use case. That means that unless engines implement a single gc per agent cluster, cross agent cycles wouldn't be collectible. That still leaves the problem of cross agent cluster communication, which I believe this proposal intends on supporting as well. There is really don't believe that a stable identity is possible (unless you go into full cooperative distributed GC). I honestly don't think the use case of associating metadata to a module definition is worth it.

In the case of a primitive allowing usage as weakmap key could be implementation defined, as long as we have a predicate to test whether a value can be used in weak collection (which we're planning on having if record/tuples have ObjectPlaceholder).

So one of the pain points with symbols that is currently a stage-2 proposal to be changed is precisely the fact you cannot observe their liveness despite being unforgeable. That proposal seeks to allow symbols as entries in WeakMap, and targets of WeakRef/FinalizationRegistry.

FYI if ObjectPlaceholder from the Record/Tuple proposal makes it through, I doubt there will be any motivation for symbols as weakmap keys.

@Jamesernator
Copy link
Author

I agree, and regardless allowing usage as WeakMap keys is sufficient to cause the problem in the presence of WeakRef in the realm. Aka you can add the module as key, and a plain sentinel object as value, and simply observe the sentinel. So my earlier distinction was pointless.

That makes sense, thanks.

I honestly don't think the use case of associating metadata to a module definition is worth it.

Yeah it probably isn't over the other constraints, in most cases people should probably associate such metadata with the module namespace itself rather than the primitive, as the primitive could still refer to many module records (one per realm).

I have seen the odd use of associating metadata to specifiers before, but those are uncollectable as they're strings anyway, so in this regard module blocks being primitives would be no worse than the status quo for that.

@surma
Copy link
Member

surma commented May 27, 2022

Trying to catch up on this...

Would the problem go away if a structured clone’d module block maintains identity?
Here’s a code example:

const block = module { /* ... */ };
console.assert(block, structuredClone(block)); // ?

But more interestingly, would this be possible/desirable?

const worker = new Worker("./a-worker-that-sends-the-same-message-back.js");
const block = module { /* ... */ };
worker.postMessage(block);
worker.addEventListener("message", ({data}) => {
  console.assert(data === block); // ???
});

@surma
Copy link
Member

surma commented May 27, 2022

Thinking about this some more, I suspect preserving identity might be the right way to go. You can’t create “fresh copies” of modules right now, so I feel like Module Blocks should not behave differently here.

@mhofman
Copy link
Member

mhofman commented May 27, 2022

@surma, that's the core of the issue. If module blocks preserve identity through structure clone across agents (with an agent being the GC boundary in current implementations), then they shouldn't be usable as WeakMap keys.

If they preserve identity and are usable as WeakMap keys, you will either have memory leaks, or require implementations to have distributed GC.

@syg
Copy link
Collaborator

syg commented May 27, 2022

Preserving identity across worker boundaries implies that module block objects would be actually shared objects and come with the restrictions that entails, like no [[Prototype]], fixed shape (or not having any properties at all), etc. Main question for me is are those restrictions desirable in this use case?

I'm obviously biased in that I want a world where that's possible, so that seems fine to me to require that of implementations -- V8 and JSC are working on having such GCs already anyways AFAIK. I wouldn't treat having a GC that supports parallel mutators as some pie-in-the-sky thing.

@mhofman
Copy link
Member

mhofman commented May 27, 2022

@syg afaik v8 and JSC are working on a GC for an agent cluster, but what about sending a module block over postMessage to another cluster (e.g. SharedWorker or ServiceWorker)?

@syg
Copy link
Collaborator

syg commented May 27, 2022

@syg afaik v8 and JSC are working on a GC for an agent cluster, but what about sending a module block over postMessage to another cluster (e.g. SharedWorker or ServiceWorker)?

You can't share any memory outside of an agent cluster, so the identity preserving semantics would be disallowed, or it would be an actual copy. (What happens when you send an SAB over the agent cluster boundary?)

@mhofman
Copy link
Member

mhofman commented May 27, 2022

Also I don't think the module block needs to be a property-less object. As long as the value is stable in a given realm, it could very well be different objects in different realms and agents. Only the round trip property is observable.

@mhofman
Copy link
Member

mhofman commented May 27, 2022

You can't share any memory outside of an agent cluster, so the identity preserving semantics would be disallowed

Is it a reasonable expectation for developers that identity is preserved when sent to some targets but not others?

What happens when you send an SAB over the agent cluster boundary?

I actually don't remember. It would be awkward if it transparently became a copy, so I kinda would expect it to throw, which wouldn't make sense for module blocks.

@syg
Copy link
Collaborator

syg commented May 27, 2022

Only the round trip property is observable.

True, but an implementation that only preserves the identity without doing actual sharing feels like a really weird cutout, and I would be against implementing such a thing.

Is it a reasonable expectation for developers that identity is preserved when sent to some targets but not others?

Definitely yes, because the "some targets but not others" distinction here is a process boundary, and it really behooves developers to understand what can be a process boundary and what cannot. Yes, it's the OS and security infra leaking through, but that's reality. Also this question only makes sense if the behavior were a transparent copy instead of an error, right?

@mhofman
Copy link
Member

mhofman commented May 27, 2022

Ok let's take a more probable scenario. You have a web page with some out of process iframes and some in process.

I believe a developer should be able to send module blocks to both (so an error for out of process iframes is not appropriate). You're saying that a developer should understand which frames may be in a different process and won't round trip module blocks while preserving identity?

@mhofman
Copy link
Member

mhofman commented May 27, 2022

Btw, none of this is a problem if the module block is a new primitive value that is unusable as WeakMap keys.

@syg
Copy link
Collaborator

syg commented May 27, 2022

I believe a developer should be able to send module blocks to both (so an error for out of process iframes is not appropriate). You're saying that a developer should understand which frames may be in a different process and won't round trip module blocks while preserving identity?

Yes? This is why we made COOP/COEP, right? It's certainly not ergonomic, but the tradeoff has been made for for better or for worse. I retract what I said about COOP/COEP, which is incorrect. Cross-origin iframes, which can either be in-process or out-of-process, already should not be able to access shared things because they already are in a different agent cluster. So my view here is that if module blocks were actually shared things because of the WeakMap use case, you can't use them with cross-origin iframes at all.

To be clear my position is, "if module blocks need to preserve identity to be usable as WeakMap keys across thread boundaries, then I prefer them to be actually shared things, and participate in the same restriction schemes as other shared things."

Btw, none of this is a problem if the module block is a new primitive value that is unusable as WeakMap keys.

That's certianly fine with me too as an implementer (and simpler). I have no horse in the race whether they should be usable in WeakMaps, I was responding to how I'd want the world to look if that was deemed important enough to try to enable.

@surma
Copy link
Member

surma commented May 27, 2022

I am a bit outside my realm of expertise, so I apologise for fluffy and vague terminology in what follows:

I was not making any statements about the “temporary” value in the other realm. I was proposing that if I send a module block to (let’s say) a worker and the worker sends it back, that the value I receive is identical to the one I originally sent. For all I care, the copy that the worker receives could be a completely independent copy and we restore the identity during structuredDeserialize() or something.

Would that allow to get the best of both worlds? I.e. keeping identity consistent within the same realm and allowing modules to be used in a WeakMap?

(Side note: It’s not a priority, so I am happy with either outcome here. I do think that using WeakMaps to store auxiliary data for types you don’t own is not that unusual, so it’d be nice to have.)

@mhofman
Copy link
Member

mhofman commented May 27, 2022

@surma whether the value in the other agent is different or not is completely unobservable, and can be considered an implementation detail (it would be observable for same origin realms).

I suppose one question I have for you is if you expect this round trip property to hold for out of process iframes/popups (e.g. 3rd party origin which hasn't or cannot opt into COOP/COEP), SharedWorker, ServiceWorker, etc.

@Jamesernator
Copy link
Author

Jamesernator commented Aug 7, 2022

I've been thinking about this again with regards to the new compartments/loaders proposal and I'm starting to think it matters less if multiple sends did correspond to multiple evaluations.

For APIs similar to blank worker a (weak) map could simply be maintained on the original thread:

class Worker {
    // Internally in worker
    // This could even be a WeakMap along with a FinalizationRegistry to allow
    // collection of such objects across threads
    readonly #cachedModuleWrapppers = new Map<ModuleSpecifier, WrappedModuleObject>();

    async addModule(module: Module | string): Promise<WrappedModuleObject> {
        if (this.#cachedModuleWrappers.has(module)) {
            return this.#cachedModuleWrappers.get(module);
        }
        // ....send module for evaluation to other thread
        // ....wait for evaluation
        // ....generate wrapper
        // ....store in cache
        // ....etc
        return moduleWrapper;
    }
}

For cases like my own example, users could instead just do:

// Incidentally I often wish there was an object version of Promise.all
const effects = {
    balloons: await worker.addModule(module {
        export function renderEffect(): ImageData {
        
        }
    }),
    discoLights: await worker.addModule(module {
        export function renderEffect(): ImageData {
        
        }
    }),
};

effects.balloons.renderEffect(); // Promise { ImageData }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants