Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daemon: Formula graph must not be mutated concurrently #2086

Closed
12 tasks done
rekmarks opened this issue Feb 17, 2024 · 1 comment · Fixed by #2149
Closed
12 tasks done

daemon: Formula graph must not be mutated concurrently #2086

rekmarks opened this issue Feb 17, 2024 · 1 comment · Fixed by #2149
Assignees
Labels
bug Something isn't working

Comments

@rekmarks
Copy link
Contributor

rekmarks commented Feb 17, 2024

Value incarnations, removals, and cancellations (collectively, "formula graph mutations") must be synchronized per #2092. This will necessitate refactoring a number of methods in host.js and (to some extent) mail.js that use these methods. Wherever possible, we should replace existing incarnateFoo() methods with their incarnateNumberedFoo() equivalents (again see #2092 for details).

What is the Problem Being Solved?

In the course of implementing #2079, it was discovered that two commands delivered serially to the daemon were fulfilled out of order, leading to unexpected behavior. In particular, the final await in this test code hangs indefinitely:

await E(host).makeUnconfined(
  'worker',
  capletLocation,
  'NONE',
  'context-consumer',
);

// `result` is a promise that resolves with 'cancelled' once the
// caplet is cancelled.
const result = E(host).evaluate(
  'worker',
  'E(caplet).awaitCancellation()',
  ['caplet'],
  ['context-consumer'],
);
await E(host).cancel('context-consumer');
t.is(await result, 'cancelled');

The test's endo.log revealed the cause:

Making make-unconfined:1f440...
Making least-authority:abc51...
Cancelled:
* make-unconfined:1f440...
Making eval:c9ae8...
Making make-unconfined:1f440...

Despite receiving the evaluate() request first and the cancel() request second, the host processes these requests in reverse order. Only by adding a delay before the final await was the desired behavior achieved.

Further examination revealed the cause to be several awaits within the host's evaluate(), before the eval formula's number had been added to the formula graph. The async operations within evaluate() created a race with cancel() that the latter would always win. Therefore, the resulting eval formula was never cancelled.

Consumers should never have to use delays / timeouts to achieve the desired order of operations when commanding the daemon. The root cause of the problem is that the formula graph was mutated concurrently, i.e. individual mutations did not complete within a single turn of the event loop. We must ensure that this never happens.

Description of the Design

Note: This may be out of date. Refer to #2092 instead.

We must preserve the invariant that the formula graph is never mutated concurrently. In other words, formula graph mutations should be mutually exclusive with each other, and completed in the order received. For the purpose of serializing formula graph mutations, we only care about the in-memory graph; we do not need to wait for the results to be written to disk, nor do we need to wait for the formula values to be reified. The following operations are subject to the invariant:

  • Adding formulas to the graph (e.g. via endo eval <src>)
  • Removing formulas from the graph (as a consequence of e.g. endo remove)
    • This includes cancelling live values (e.g. via endo cancel <name>)

The simplest way to preserve the invariant would be to make all relevant functions synchronous. However, this may not always be possible. For instance, randomHex512 uses the callback signature of crypto.randomBytes to avoid blocking the event loop as it waits for the underlying primitives to collect enough entropy to generate a securely random number. This function is used pretty frequently, and the performance costs may therefore be prohibitive. Consequently, there may be an inescapably asynchronous component of formula graph mutation, around which we will probably put a mutex with a queue.

Finally, while discussing this problem with @kriskowal, we speculated that it may be helpful for consumers to be able to await formula graph mutation separately from the reification of associated values. For instance, if we have successfully preserved the invariant, but formula graph mutation remains async, it may make sense to make e.g. host.evaluate() return Promise<{ value: Promise<unknown> }>. The outer promise would resolve when formula graph mutation is completed, while the value property would resolve with the value when it becomes available. By wrapping the inner promise in an object, awaiting the outer promise wouldn't implicitly flatten the wrapped promises. This is an optional goal for the completion of this issue.

@rekmarks rekmarks added bug Something isn't working enhancement New feature or request labels Feb 17, 2024
@rekmarks rekmarks self-assigned this Feb 17, 2024
@rekmarks rekmarks removed the enhancement New feature or request label Feb 17, 2024
@rekmarks rekmarks changed the title daemon: Formula graph must not be mutated asynchronously daemon: Formula graph must not be mutated concurrently Feb 18, 2024
@kumavis

This comment was marked as resolved.

rekmarks added a commit that referenced this issue Feb 22, 2024
Fixes: #2021
Progresses: #2086
Ref: #2098

Synchronizes the host's `evaluate()` method by delegating all incarnations to the daemon via `incarnateEval()`. The latter is responsible for incarnating its dependents as necessary, and for the generation of their formula numbers. To facilitate this, the synchronous methods `incarnateLookupSync()` and `incarnateWorkerSync()` have been added. These methods synchronously mutate the formula graph, and return promises that resolve when the formulas have been written to disk. The result is that the formula graph is mutated within a single turn of the event loop.
rekmarks added a commit that referenced this issue Feb 22, 2024
Updates #2086 

Exposes the `context` object to confined and unconfined caplets by passing it as a second argument to their `make()` function.
rekmarks added a commit that referenced this issue Mar 8, 2024
Progresses: #2086

Synchronizes the host's makeGuest() and its dependencies. See #2086 for details.
rekmarks added a commit that referenced this issue Mar 8, 2024
Synchronizes the host's `makeUnconfined()` per #2086. Refactoring
`daemon.js` in support of this goal fixed one bug while revealing
another.

In particular, #2074 is progressed by enabling indirect
cancellation of caplets via their workers. The issue is not resolved
since indirect cancellation of caplets via their caplet dependencies
still does not work as intended. A new, failing regression test has been
added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092.
Rather than fixing the bug, that PR concealed it by always creating a
new incarnation of `eval` formula workers, even if they already existed.
The regression test for #2021 has been marked as failing, and we will
have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 8, 2024
Synchronizes the host's `makeUnconfined()` per #2086. Refactoring
`daemon.js` in support of this goal fixed one bug while revealing
another.

In particular, #2074 is progressed by enabling indirect
cancellation of caplets via their workers. The issue is not resolved
since indirect cancellation of caplets via their caplet dependencies
still does not work as intended. A new, failing regression test has been
added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092.
Rather than fixing the bug, that PR concealed it by always creating a
new incarnation of `eval` formula workers, even if they already existed.
The regression test for #2021 has been marked as failing, and we will
have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 8, 2024
Progresses: #2086 

Removes the host's `makeWorker()` and replaces its use everywhere with the equivalent `provideWorker()`. The latter is essentially like the former but with more validation.
rekmarks added a commit that referenced this issue Mar 8, 2024
Synchronizes the host's `makeUnconfined()` per #2086. Refactoring
`daemon.js` in support of this goal fixed one bug while revealing
another.

In particular, #2074 is progressed by enabling indirect
cancellation of caplets via their workers. The issue is not resolved
since indirect cancellation of caplets via their caplet dependencies
still does not work as intended. A new, failing regression test has been
added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092.
Rather than fixing the bug, that PR concealed it by always creating a
new incarnation of `eval` formula workers, even if they already existed.
The regression test for #2021 has been marked as failing, and we will
have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 8, 2024
Synchronizes the host's `makeUnconfined()` per #2086. Refactoring
`daemon.js` in support of this goal fixed one bug while revealing
another.

In particular, #2074 is progressed by enabling indirect
cancellation of caplets via their workers. The issue is not resolved
since indirect cancellation of caplets via their caplet dependencies
still does not work as intended. A new, failing regression test has been
added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092.
Rather than fixing the bug, that PR concealed it by always creating a
new incarnation of `eval` formula workers, even if they already existed.
The regression test for #2021 has been marked as failing, and we will
have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 9, 2024
Synchronizes the host's `makeUnconfined()` per #2086. Refactoring
`daemon.js` in support of this goal fixed one bug while revealing
another.

In particular, #2074 is progressed by enabling indirect
cancellation of caplets via their workers. The issue is not resolved
since indirect cancellation of caplets via their caplet dependencies
still does not work as intended. A new, failing regression test has been
added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092.
Rather than fixing the bug, that PR concealed it by always creating a
new incarnation of `eval` formula workers, even if they already existed.
The regression test for #2021 has been marked as failing, and we will
have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 9, 2024
Progresses: #2086

Synchronizes the host's `makeUnconfined()` per #2086. Refactoring `daemon.js` in support of this goal fixed one bug while revealing another.

In particular, #2074 is progressed by enabling indirect cancellation of caplets via their workers. The issue is not resolved since indirect cancellation of caplets via their caplet dependencies still does not work as intended. A new, failing regression test has been added for this specific case.

The revealed bug is #2021, which we believed to be fixed by #2092. Rather than fixing the bug, that PR concealed it by always creating a new incarnation of `eval` formula workers, even if they already existed. The regression test for #2021 has been marked as failing, and we will have to find a different solution for it.
rekmarks added a commit that referenced this issue Mar 9, 2024
Progresses: #2086 

Synchronizes the host's `makeBundle()` following the pattern established by `makeUnconfined()`. Eliminates the host's
`provideWorkerFormulaIdentifier()`, which is everywhere replaced with the synchronous `prepareWorkerFormulaIdentifier()`. Performs some opportunistic refactors of `makeX` and `incarnateX` where `X === 'Bundle' | 'Unconfined'`.
rekmarks added a commit that referenced this issue Mar 14, 2024
Progresses: #2086 
Fixes: #2121 

Synchronizes `makeHost()` and its dependent functions. Also refactors the same and `makeGuest()` to extract their commonalities and restore the more succinct form they possessed before the synchronization effort began. As part of the refactor, #2121 was fixed.

From this point onward, we will start referring to endo "agents" instead of "parties".
rekmarks added a commit that referenced this issue Mar 15, 2024
Progresses: #2086 

Replaces the daemon's `incarnateLeastAuthority()` with the synchronous `incarnateNumberedLeastAuthority()`, thereby synchronizing it.
rekmarks added a commit that referenced this issue Mar 16, 2024
)

Progresses: #2086 

Merges the daemon's `storeReaderRef()` into `incarnateReadableBlob()` and synchronizes the implementation. Also converts `makeReadableBlob()` into `makeControllerForReadableBlob()` to align it with the family of functions that already exist for most other formulas.

Contrary to its predecessor, `incarnateReadableBlob()` returns the value for stored value, namely a `FarEndoReadable`. A unit test is added for the case of immediately using a stored value without naming it.
rekmarks added a commit that referenced this issue Mar 16, 2024
Progresses: #2086 

> coordinate, verb, "to combine in harmonious relation or action."

Synchronizes the host's `provideWorker()`. In addition, coordinates the implementations of the host's `provideX` methods such that they all:
1. Attempt to get the existing formula id for the provided name, if any.
2. If there is no existing formula id, incarnate and return a new value.

All type checks during step 1 have been removed. In other words, if you attempt to provide something under an existing name that resolves to a different value type, you're on your own. We may revisit this decision in the future.
rekmarks added a commit that referenced this issue Mar 16, 2024
Closes: #2086

Synchronizes the daemon's `incarnateEndoBootstrap()`, thereby concluding
the synchronization effort. It should now be impossible to concurrently
mutating the formula graph through the public API.
rekmarks added a commit that referenced this issue Mar 16, 2024
Closes: #2086

Synchronizes the daemon's `incarnateEndoBootstrap()`, thereby concluding the synchronization effort. It should now be impossible to concurrently mutating the formula graph through the public API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants