Skip to content
This repository has been archived by the owner on Aug 29, 2021. It is now read-only.

Parent execution revision #74

Merged
merged 12 commits into from
May 20, 2019
Merged

Parent execution revision #74

merged 12 commits into from
May 20, 2019

Conversation

guybedford
Copy link
Collaborator

@guybedford guybedford commented May 7, 2019

This revises the parent execution ordering fix for #64 (previously #71), while also ensuring that cycles transition in a well-defined way.

This retains the parent chain execution semantics when an asynchronous dependency is added, without causing any execution ordering changes. For example with this approach, sync modules can provide polyfills that will execute before async dependencies in order. The spec diff has simplified down quite a lot in the process.

Consider for example:

main.js

import './dep.js';
console.log('main');

dep.js

console.log('dep');
Promise.resolve().then(() => {
  console.log('dep task');
});

the log when importing main is "dep", "main", "dep task".

If we add a dependency to dep.js that does not use top-level await, we retain the "dep", "main", "dep task" ordering.

But if we add an async dependency to dep.js:

dep.js

import 'async-dep';
console.log('dep');
Promise.resolve().then(() => {
  console.log('dep task');
});

then the ordering becomes "async-dep", "dep", "dep task", "main".

The problem with the above is that the timing of the parent modules relative to eachother is changed by introducing an async dependency.

This PR resolves that so that the execution semantics of the parent ordering and timings remains the same, to provide the output "async-dep", "dep", "main", "dep-task".

This is achieved with global completion handlers on module execution that manage the graph state transitions.

@guybedford guybedford force-pushed the parent-sync branch 2 times, most recently from ac05481 to bf31f31 Compare May 7, 2019 02:02
@guybedford
Copy link
Collaborator Author

guybedford commented May 7, 2019

I've gone ahead and polished up the implementation quite a bit... the spec had a few gaps caught between refactoring. Some implementation notes to describe the approach further:

  • I ended up implementing an "evaluating-async" state here finally to ensure the handlers can easily tell when a module is completed by "evaluated" well and truly meaning evaluated.
  • [[AsyncParentModules]] are graph references created during InnerModuleEvaluation, representing the back-references on the exact acyclic post-order tree of execution. [[AsyncParentModules]] only tracks the parents of async modules. When in cycles, weyon't track the unused back edge of the cycle. This is the complex part: for modules within a cycle, we do not track [[AsyncParentModules]] for dependency relations that originate from outside of the cycle. This is what allows cycles to remain strongly connected components. That this works out takes some thought, so I've included an example in the next comment detailing this a little more clearly.
  • [[PendingAsyncDependencies]] is a counter for the completion of the async deps of a module, and directly corresponds with [[AsyncParentModules]], including its association with the exact top-level initiator, and cycle ordering.
  • [[TopLevelCapability]] represents the complete evaluation of the given top-level job. This is only created whenever Evaluate() is called for a top-level execution. When a module is already executing asynchronously, it can be populated on that module, and the fulfillment handler for the module will ensure it is resolved. For already-evaluating cycles, this is never populated on inner cycle modules - only the root of the cycle. This is handled by a GetCycleRoot call. If the cycle is unevaluated, then by definition of calling Evaluate() on it, this module is the root of the cycle.
  • [[EvaluationError]] is now stored properly on async modules. It is propagated in all cases upward and synchronously.
  • When there is an error, the error is propogated to the [[AsyncParentModules]] first, calling their error handlers in turn, before any [[TopLevelCapability]] rejection is fired. This [[AsyncParentModules]] reporting will go straight up the direct singular path up the tree to the top-level module causing its failure transition synchronously (and async cycles will always fail synchronously as a connected component together, a feature as a consequence of this PR approach). The rejection order also remains true to the importer order. Errors for later references to the same async module in the graph are propoagated through the [[AsyncParentModules]] as well.

The above is the simplest approach for this logic flattening to get better deterministic guarantees around sync completions in the semantics that I could find after experimenting with various different approaches.

Look forward to review in due course, and happy to answer any questions further.

spec.html Outdated Show resolved Hide resolved
spec.html Outdated Show resolved Hide resolved
@littledan
Copy link
Member

littledan commented May 8, 2019

Thanks for this detailed refactoring. Seems like there are a lot of good improvements in this version. I really appreciate how the volume of spec text keeps decreasing. I like the name changes as well.

I am still wondering if it's possible to eliminate the [[Async]] internal slot, and to let the "evaluating-async" state just be "evaluated", as in the original #51, but it will take me some more time to think all this through.

@guybedford guybedford changed the title Parent sync execution revision Parent execution revision May 11, 2019
@guybedford
Copy link
Collaborator Author

guybedford commented May 11, 2019

I managed to add a further refactoring here to cut down on the number of fields and also to ensure evaluation errors always propagate synchronously through all subgraphs, effectively reducing the "memory footprint" of the algorithm.

I haven't gone into much detail yet on how better guarantees are provided around cycles with this PR, and with the current approach, cycle handling is pretty much the main complexity. So I wanted to just provide an example to hopefully clarify some of this logic.

Consider the following graph (excuse the terrible image):

image

Where A is the top-level module being imported, and the numbers next to each of the modules are the DFSIndex and DFSAncestorIndex. In this example, all modules except for E are in the same strongly connected cycle unit (DFSAncestorIndex 0). E is a dependency of this cycle. Assume every module is async for the purposes of this example since this example is about async cycles not the sync behaviour interactions which are the simpler ones :P

The execution order of this graph is D,B,E,C,A, where B, C and A will only execute after their depenencies have completed.

Synchronous Behaviour

Now usually if this were a sync graph and there was an execution error on any of the modules, say on D, we would have "stack" containing A,B,D, all three of which would immediately transition to an error state. C and E wouldn't actually be in the stack, and would thus remain in an "instantiating" state (since they haven't been visited yet), where instead they would throw if you tried to import them later on, since their dependencies have cached errors.

Asynchronous Behaviour

So what is the analog in the async cycle case?

Instead of relying on "stack", we populate AsyncParentModules for any async module in the graph that isn't executed immediately in InnerModuleEvaluation.

So B points to A. D points to both B and C. A has no AsyncParentModules as it is the cycle root for this top-level evaluation.

So if there is an error when asynchronously executing D, AsyncEvaluationRejected will propagate the error up through the AsyncParentModules, which is unique to this cycle. So we call AsyncEvaluationRejected on the modules in this order: D, B, A, C, immediately transitioning all those modules together as a strongly connected component to the failure state. This is very similar to the stack approach although slightly different in that we also transitioned C (although note that an error on B would not have transitioned C or D).

Top-level cycle dependencies

Where cycles get tricky is when we throw other top-level evaluation jobs into the mix.

If we had another module F dynamically imported after A, which imported say D, while the cycle modules are still asynchronously evaluating, what we don't want is for F to start executing on completion of D, even though the whole cycle has not finished yet. We want to ensure that F only executes after the entire cycle has finished executing.

The way we do this is by treating the first execution of the cycle as taking "ownership" of the cycle (and effectively owning the DFSIndex and DFSAncestorIndex values - these are never be changed by any future importers). Then when importing from a cycle that has already been started (so not "evaluating" or "instantiated"), we instead look to the root of the cycle for checking for evaluation errors and adding evaluation listeners.

So we try import D, which is currently in an "evaluating-async" state. We then walk up the AsyncParentModules of D until the DFSIndex matches the DFSAncestorIndex representing the root of the cycle at A using the GetCycleRoot abstract method. We then add F to the list of AsyncParentModules list of A that should get completion handlers applied on resolution or rejection of the cycle.

If D completes successfully, but then E gets evaluated causing an error, that error is propagated up to A, and then propagates up to F, all in a well-defined way avoiding any partial states of the graph.

Same-graph cycle dependencies

If instead of being deferred, F was another module in the same top-level evaluation job, say M were the top-level evaluation, where M imported both A and F, and F still imports D from within the cycle, then we have two cases - either F is imported first, in which case it takes ownership of the cycle and the cycle is no longer owned by A (so the graph gets reorganized above), or F is imported second, in which case the cycle is already in the "evaluating-async" state when F is traversed. In this case we treat F exactly the same as in the previous section - having it depend on the root of the cycle's completion. ie calling GetCycleRoot(D) to get to A, and then adding F to A's AsyncParentModules list for notification on resolution. So this kinda closes the loop on proving strongly connected components always transition together.

Hopefully that explains how these fields handle the complex edge cases quite elegantly. I've included lots of assertions around cycles in the algorithm to make these invariants clear.

@guybedford
Copy link
Collaborator Author

Here's an interesting open question about cycles and error handling.

Let's consider the a graph where A depends on B and C, B depends on A, C depends on D, and D depends on A.

So the execution order is B, D, C, A, where B and D execute immediately, C executes after D completes, and A executes after both B and C have completed.

Now if there was an asynchronous rejection for B, while D was still in progress, should the completion of D's execution still go ahead and execute C, or should D's execution completion check and see that the cycle has errored and stop execution entirely?

I get the feeling we should probably do the latter?

The spec in this PR doesn't currently include that but we can add that check in roughly one or two lines.

@littledan
Copy link
Member

I really like the simplifications in this patch version. Great job thinking through all the cases in https://github.com/tc39/proposal-top-level-await/pull/74/files#issuecomment-491547656 . After thinking it through a bit, I agree with your judgements there (especially the cycle root technique, which is tricky). @zenparsing previously asked about reentrancy for this algorithm, and I think you've done a good job there.

There are a couple issues with the way this is written up which I think could use some edits before landing:

  • When a sync module depends on an async one, and something else depends on that async module, I think the logic here breaks down. I think we should be using the parent dependency graph all the time, not just for [[Async]] modules, for this reason.
  • I'd like ExecuteModule to have a signature that can work for other module types. The current draft passes in the Promise capability into ExecuteModule, but this PR removes that and creates the capability inside. I'm wondering if we should do a third thing, and just expose abstract algorithms for reporting success and failure for module loading, which could be called either synchronously or asynchronously. Then, [[Async]] would actually be something in the Source Text Module Record, actually.
  • Seems like some cases are missing for resolving the [[TopLevelCapability]], resulting in a never-resolved Promise. For example, if Evaluate() is called while the module is already evaluated. I don't see a path for calling [[Resolve]] on synchronous modules, either (but presumably that'd go away per the previous point). I'm wondering if we should just make this Promise unconditionally, and maybe include a Note that it's not used anywhere except at the top level and can sometimes be optimized away.
  • Optional/later: I wonder if we should construct the graph pointing to the parents during the Link phase. I think this would let us get rid of the "evaluating-async" phase. On the other hand, it means that Link would pin things up to the graph to get executed, without waiting for the next phase to make that happen. I'm not sure what, if any, embedding environments might Link far before they Execute, in the context of multiple subgraphs coming in over time, though.
  • Optional/later: You wrote up some great text about motivation in https://github.com/tc39/proposal-top-level-await/pull/74/files#issuecomment-491547656 . Let's see if we can work this in in notes in the specification, or at least the explainer, so it's less cryptic to folks who come along later.

I'll be trying to fix these up.

@littledan
Copy link
Member

At a high level, I do like the idea of this change. It's taken me some time to understand why we wouldn't want to use Promises for the reactions, but I do see the logic in the ordering that you start with here. This kind of change seems completely consistent with the possibility of host environments inserting more pauses/yielding to the event loop/microtask checkpoints, which they could do by wrapping Source Text Module Records in something which adds that logic. And some of that logic could also be added from JavaScript by wrapping one module in another which uses top-level await! So I'm getting convinced that doing these reactions synchronously is the most reasonable, least opinionated sort of way to do this, and forms a good basis for opinionated constructs on top.

@littledan
Copy link
Member

littledan commented May 18, 2019

There's an editorial choice we could make in organizing this specification: Should we build the dependency graph up front (and have all evaluation be based on it), or should we built it on-line, just for the edges that are necessary due to async-ness? This patch goes with the former choice, but I was initially thinking of the latter. I'm not sure which one is conceptually simpler, or easier to understand for various people reading it.

@guybedford
Copy link
Collaborator Author

guybedford commented May 19, 2019

When a sync module depends on an async one, and something else depends on that async module, I think the logic here breaks down. I think we should be using the parent dependency graph all the time, not just for [[Async]] modules, for this reason.

We do do this in this spec, and that is done by checking the "evaluating-async" module status. When the sync module depends on the async module, its state is put into "evaluating-async". Similarly if any module depends on any evaluating async module, it attaches to the completion in the graph.

Note that the "evaluating-async" state is important exactly because this attachment only happens while the module is still executing. Once it is executed there is no attachment to the completion event, and "evaluating-async" moving to "evaluated" gives us this information.

I'd like ExecuteModule to have a signature that can work for other module types. The current draft passes in the Promise capability into ExecuteModule, but this PR removes that and creates the capability inside. I'm wondering if we should do a third thing, and just expose abstract algorithms for reporting success and failure for module loading, which could be called either synchronously or asynchronously.

The capability approach seemed the simplest and lowest-diff way to achieve this, but we could certainly look at other approaches. The capability approach was exactly used in the early 2014 draft of module algorithm briefly in ECMA262 by Jason Orendorff that specified the loader, and I remember finding it quite a useful pattern back then, but if you think there's a better way certainly let's do that.

Agreed it could be worthwhile to update this approach to ensure it works for other module types by lowering some of the loading information like AsyncParentModules and the other fields down to Cyclic Module Records or even Abstract Module Records to support this. If that seems odd, this information could even be its own "table" which represents the in-progress load state.

Happy to discuss this further / work on this as well, but I was just focusing on getting Source Text Modul Records right first!

Seems like some cases are missing for resolving the [[TopLevelCapability]], resulting in a never-resolved Promise. For example, if Evaluate() is called while the module is already evaluated. I don't see a path for calling [[Resolve]] on synchronous modules, either (but presumably that'd go away per the previous point). I'm wondering if we should just make this Promise unconditionally, and maybe include a Note that it's not used anywhere except at the top level and can sometimes be optimized away.

I wanted to avoid the need to create a promise for every single module, as that just seems like wasted memory overhead to me. Just having promises for top-level jobs to ensure cycle resolution matches is all we need really.

We could be populating a promise for every module, it's just wasted work that's all.

If a module is in the "evaluating-async" state (including sync parents of async modules, which synchronously transition to this state when Evaluate runs), then the attachment to the completion is handled entirely by AsyncParentModules, which also gives us a well-defined execution transition from one completion to all possible parents.

Optional/later: I wonder if we should construct the graph pointing to the parents during the Link phase. I think this would let us get rid of the "evaluating-async" phase. On the other hand, it means that Link would pin things up to the graph to get executed, without waiting for the next phase to make that happen. I'm not sure what, if any, embedding environments might Link far before they Execute, in the context of multiple subgraphs coming in over time, though.

Instantiate is effectively asynchronous as it happens while modules are fetching through HostResolveImportedModule I believe? As such, one slow module might hold up the tree, whereas another top-level evaluation job that shares a lower dependency could have already executed that dependency.

Optional/later: You wrote up some great text about motivation in https://github.com/tc39/proposal-top-level-await/pull/74/files#issuecomment-491547656 . Let's see if we can work this in in notes in the specification, or at least the explainer, so it's less cryptic to folks who come along later.

That would be great!

There's an editorial choice we could make in organizing this specification: Should we build the dependency graph up front (and have all evaluation be based on it), or should we built it on-line, just for the edges that are necessary due to async-ness? This patch goes with the former choice, but I was initially thinking of the latter. I'm not sure which one is conceptually simpler, or easier to understand for various people reading it.

This patch only builds up AsyncParentModules when there are pending executions, so is actually the second case you describe and not the former case. Edit: although, yes we process the graph state immediately and synchronously that populated this so I see what you mean in it being upfront. The main reasons for this are to ensure well-defined cycle handling, since the CycleRoot is relative and defined by the top-level parent. Without cycles we could certainly not need any ownership model for executions.

Finally, did you have any thoughts on ensuring we stop executions of further async modules in a cycle after any error in that cycles as in #74 (comment)? I'll gladly add a further commit to support this.

@guybedford
Copy link
Collaborator Author

Correction: instantiation appears to be a synchronous operation in the spec, under the assumption that another mechanism has already ensured the dependencies are available it seems, so that HostResolveImportedModule must be sync.

If this is the case, we could certainly move some logic into instantiate, but please confirm this first?

@littledan
Copy link
Member

I would be interested to see that patch. I can't form an opinion until I understand better the semantics you are thinking of.

Thanks for explaining the motivation for phrasing the patch as you did. As I explained above, there are several missing pieces of logic in the algorithms, and some logic should be moved from STMR to CMR to enable other module types, so I will try to write these up in a conservative way without those bigger changes.

@guybedford
Copy link
Collaborator Author

guybedford commented May 19, 2019

Ok, I've pushed a commit with support for stopping execution if any modules in the cycle have failed, which requires its own specific check to support.

I believe the approach here is fully algorithmically correct. I tried to clarify above how ExecuteModule promise handling and sync modules depending on async modules are all well-defined, so if there's any way I can answer further or discuss on chat I'd be happy to do that. The lowering would be great to see though, so thanks for working on this.

@littledan
Copy link
Member

That last change LGTM. I suppose we could simplify that microscopically further because been only ever need to check the cycle root's evaluation error, not our own one, right?

@littledan
Copy link
Member

littledan commented May 19, 2019

Some proposed edits at guybedford#3 . I think these are required for correctness. Find details in their commit messages.

littledan added a commit to littledan/html that referenced this pull request May 19, 2019
This reverts commit 8415c83.

tc39/proposal-top-level-await#74 restores the
property of Evaluate() that it always returns a Promise, saving some
complexity in HTML.
@guybedford
Copy link
Collaborator Author

@littledan great, I've commented on the PR there.

I suppose we could simplify that microscopically further because been only ever need to check the cycle root's evaluation error, not our own one, right?

Unfortunately not, consider the example from #74 (comment), where D has completed execution, and B and E are in progress. If there is a failure on B while E is still in progress, it gets reported up the AsyncParentModules to the root, but because E is a leaf, it will still want to move to executing C unless we explicitly check the cycle root A for any error from other branches.

@guybedford
Copy link
Collaborator Author

guybedford commented May 20, 2019

(So we need to check both E and A's completions because E doesn't participate in the cycle, it is just a dependency of it)

@littledan
Copy link
Member

Landing this and will land guybedford#3 immediately afterwards.

@littledan littledan merged commit b5a9799 into tc39:master May 20, 2019
Ms2ger pushed a commit to littledan/html that referenced this pull request Jun 12, 2019
This reverts commit 8415c83.

tc39/proposal-top-level-await#74 restores the
property of Evaluate() that it always returns a Promise, saving some
complexity in HTML.
Ms2ger pushed a commit to littledan/html that referenced this pull request Jun 24, 2019
This reverts commit 8415c83.

tc39/proposal-top-level-await#74 restores the
property of Evaluate() that it always returns a Promise, saving some
complexity in HTML.
Ms2ger pushed a commit to littledan/html that referenced this pull request Sep 20, 2019
This reverts commit 8415c83.

tc39/proposal-top-level-await#74 restores the
property of Evaluate() that it always returns a Promise, saving some
complexity in HTML.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants