Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry helper to durably watch idempotent promises #9541

Closed
Tracked by #9303
turadg opened this issue Jun 20, 2024 · 8 comments · May be fixed by #9608
Closed
Tracked by #9303

retry helper to durably watch idempotent promises #9541

turadg opened this issue Jun 20, 2024 · 8 comments · May be fixed by #9608
Assignees
Labels
enhancement New feature or request

Comments

@turadg
Copy link
Member

turadg commented Jun 20, 2024

What is the Problem Being Solved?

Vows that wrap promises risk having the promise sever. E.g. a remote call to agoricNames could sever upon upgrade. Fortunately, in that case simply retrying is safe.

Description of the Design

A helper that makes a remote call that will retry upon upgrade failure.

const lookupAsset = retry(privateArgs.agoricNames, 'lookup', 'vbankAsset');

Security Considerations

Scaling Considerations

Test Plan

Upgrade Considerations

@turadg turadg added the enhancement New feature or request label Jun 20, 2024
0xpatrickdev added a commit that referenced this issue Jun 21, 2024
 * Perform remote calls to agoricNames in membrane-friendly way. This is an
 * interim approach until #9541,
 * #9322, or
 * #9519
0xpatrickdev added a commit that referenced this issue Jun 21, 2024
 * Perform remote calls to agoricNames in membrane-friendly way. This is an
 * interim approach until #9541,
 * #9322, or
 * #9519
0xpatrickdev added a commit that referenced this issue Jun 25, 2024
 * Perform remote calls to agoricNames in membrane-friendly way. This is an
 * interim approach until #9541,
 * #9322, or
 * #9519
0xpatrickdev added a commit that referenced this issue Jun 25, 2024
 * Perform remote calls to agoricNames in membrane-friendly way. This is an
 * interim approach until #9541,
 * #9322, or
 * #9519
mergify bot added a commit that referenced this issue Jun 25, 2024
refs: #9449

## Description

 Perform remote calls to agoricNames in membrane-friendly way. This is an interim approach until #9541, #9322, or #9519.
mergify bot added a commit that referenced this issue Jul 9, 2024
refs: #9449

## Description

This makes ChainHub return durable vows. The underlying lookup functions are not durable but they made so by the `retriable` helper.

So this also introduces a stub version of #9541 pulled from @dtribble 's #9657

### Security Considerations
none

### Scaling Considerations
none

### Documentation Considerations
none

### Testing Considerations
CI

### Upgrade Considerations
not yet upgradable. Communicated by https://github.com/Agoric/agoric-sdk/blob/a267ba20db0eeb2082c287ca7b69e37d6f272eb9/packages/vow/src/tools.js#L37-L38
@turadg
Copy link
Member Author

turadg commented Jul 12, 2024

#9685 added the API for one but it doesn't support upgrade

@mhofman
Copy link
Member

mhofman commented Jul 12, 2024

I'm thinking that retriable should support not only disconnection reasons, but also error rejections whose message is "vat terminated". This can currently occur (#9582) when we send a message to a heap or virtual object in the remote vat which gets upgraded. Since one purpose of retriable is to handle operations breaking because of the upgrade of a remote vat that isn't resumable, it's possible that this situation could legitimately occur in the middle of a retriable flow. Unlike upgrade disconnection, we must limit ourselves to a single consecutive retry since we cannot differentiate the case of serial remote upgrades (no incarnation number field).

OTOH, retrySend might not need to support "vat terminated" errors because retrying the send will only help if the error is nested: it's not the target of the retriable send that was abandoned, but an object used by the remote, and retrying the operation will cause the remote to re-obtain a fresh object.

@erights
Copy link
Member

erights commented Jul 12, 2024

Would it be better for these to be rejected with UpgradeDisconnection as well?

@mhofman
Copy link
Member

mhofman commented Jul 12, 2024

possibly something similar, but we likely need the ability to differentiate between the target being abandoned and the pending operation on the target being disconnected. In #9582 we discuss in the future using a chain of error.cause to maintain more information on disconnection errors including stack traces.

@mhofman
Copy link
Member

mhofman commented Jul 25, 2024

Recent discussions and audit of use cases highlights the following:

  • retryable instead of retriable
  • retryable does not need to support checkpointing (doesn't need to provide any invocation specific state to the retried function). Only fully idempotent calls should be retried
  • retrySend is currently only needed for the host side, aka it doesn't need to compose with async-flow. That means E.retrySend might be the wrong place for it (but vowTools.retrySend is likely fine)

mergify bot added a commit that referenced this issue Aug 2, 2024
refs: #9541, #9322, #9519

## Description

add `getVBankAssetInfo()` method on agoric chain object.

DRAFT until

 - [x] how to add chain-specific methods to return type of `getChain()`?
 - [x] refactor uses of `makeResumableAgoricNamesHack` to use this

### Documentation Considerations

change is visible via `orchestration-api.d.ts`
cc @dtribble 

### Security Considerations

none?

### Scaling Considerations

returns O(n) data from O(1) args

### Testing Considerations

perhaps not independently tested

### Upgrade Considerations

not yet deployed
@dckc
Copy link
Member

dckc commented Aug 19, 2024

A wish: rather than moving the method name to a string argument:

const lookupAsset = retry(privateArgs.agoricNames, 'lookup', 'vbankAsset');

keep it in the method position, so that we could preserve types:

const lookupAsset = retryE(privateArgs.agoricNames).lookup('vbankAsset');

but that reminds me of heapVowE. Is it straightforward to make a durable form of heapVowE using makeE from @agoric/vows?

@mhofman
Copy link
Member

mhofman commented Aug 19, 2024

so that we could preserve types

I believe preserving types is an orthogonal concern. There may be a way to maintain typing with the current form.

The suggested retryE is really proxy sugar on top of the underlying retry functionality.

that reminds me of heapVowE. Is it straightforward to make a durable form of heapVowE using makeE from @agoric/vows?

I really do not understand this.

mergify bot pushed a commit that referenced this issue Sep 24, 2024
refs: #9541

## Description
Extracted the chain hub changes from #9785 

Changed `makeChainHub` to accept a zone instead of it implicitly creating a heap zone. A durable zone is required for `retriable`.

### Security Considerations
None

### Scaling Considerations
This store the map in durable storage instead of keeping it in the heap, which uses more persistent storage.

### Documentation Considerations
Not sure if any docs need to be updated besides the JS doc for `makeChainHub`

### Testing Considerations
Updated tests and examples

### Upgrade Considerations
This moves into durable storage some info that was heap only before, effectively commiting ourselves to the shape of the data in these maps.
mergify bot added a commit that referenced this issue Oct 1, 2024
refs: #9541

## Description

Adds a `retryable` helper which retries an idempotent function on upgrade disconnection reasons.

Update the upgrade disconnection `isRetryableReason` predicate to accept `vat terminated` errors created by liveslots when trying to message an ephemeral object disconnected by an upgrade (until #9582 is addressed)

Remove the stub `retriable` helper, and use the `retryable` implementation.

### Security Considerations

To avoid infinite loops, the `vat terminated` detection logic does not allow retrying such consecutive reasons.

### Scaling Considerations

None

### Documentation Considerations

JSDoc for `retryable`

### Testing Considerations

Full unit test and liveslots upgrade test coverage

### Upgrade Considerations

The current `retriable` implementation was just a stub that was advertised as not actually working with upgrades. This introduces the actual implementation, which can be picked up by contract authors by simply updating to the next NPM release containing this change. Going forward, defined retryable flows will need to be redefined on upgrade like any other durable exo defined by a contract.
@mhofman
Copy link
Member

mhofman commented Oct 4, 2024

retryable was done in #9785. #10227 will track a potential retrySend

@mhofman mhofman closed this as completed Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants