Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VowTools retryable #9785

Merged
merged 6 commits into from
Oct 1, 2024
Merged

VowTools retryable #9785

merged 6 commits into from
Oct 1, 2024

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Jul 25, 2024

refs: #9541

Description

Adds a retryable helper which retries an idempotent function on upgrade disconnection reasons.

Update the upgrade disconnection isRetryableReason predicate to accept vat terminated errors created by liveslots when trying to message an ephemeral object disconnected by an upgrade (until #9582 is addressed)

Remove the stub retriable helper, and use the retryable implementation.

Security Considerations

To avoid infinite loops, the vat terminated detection logic does not allow retrying such consecutive reasons.

Scaling Considerations

None

Documentation Considerations

JSDoc for retryable

Testing Considerations

Full unit test and liveslots upgrade test coverage

Upgrade Considerations

The current retriable implementation was just a stub that was advertised as not actually working with upgrades. This introduces the actual implementation, which can be picked up by contract authors by simply updating to the next NPM release containing this change. Going forward, defined retryable flows will need to be redefined on upgrade like any other durable exo defined by a contract.

Copy link

codecov bot commented Jul 25, 2024

Codecov Report

Attention: Patch coverage is 56.27530% with 108 lines in your changes missing coverage. Please review.

Project coverage is 56.39%. Comparing base (173e1d8) to head (a36498a).
Report is 35 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9785      +/-   ##
==========================================
+ Coverage   56.37%   56.39%   +0.02%     
==========================================
  Files         667      668       +1     
  Lines      207526   207871     +345     
  Branches      339      368      +29     
==========================================
+ Hits       116986   117227     +241     
- Misses      90422    90526     +104     
  Partials      118      118              
Files Coverage Δ
packages/internal/src/upgrade-api.js 100.00% <100.00%> (ø)
packages/orchestration/src/exos/chain-hub.js 98.53% <100.00%> (-0.01%) ⬇️
packages/orchestration/src/utils/start-helper.js 96.17% <100.00%> (+0.02%) ⬆️
packages/vow/src/tools.js 100.00% <100.00%> (+3.89%) ⬆️
packages/vow/src/retriable.js 50.45% <50.45%> (ø)

... and 5 files with indirect coverage changes

@mhofman mhofman requested a review from erights July 25, 2024 22:41
harden(isUpgradeDisconnection);

/**
* Returns whether a reason is a 'vat terminated' error generated when an object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this as an expedient measure, until we distinguish this rejection condition some other way. Either, like UpgradeDisconnection, use something other than an Error as a rejection to be tested. For an error, the only content that a program should make a semantic decision on is .name. Most emphatically, code should not make a semantic decision on an error .message.

So, an encapsulated temporary expedient measure is fine if clearly marked as such, and with a filed issue to fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

( @mhofman let me know the relevant issue is #9582 )

Please add a comment here with that link.

const error = harden(Error('vat terminated'));
const remoteError = fromCapData(toCapData(error));

t.true(isAbandonedError(remoteError));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. But to reiterate, this probably depends on stability beyond what I expect to propose for ocapn to mandate.

packages/orchestration/src/exos/chain-hub.js Outdated Show resolved Hide resolved
import { makeHeapZone } from '@agoric/base-zone/heap.js';
import { makeE, prepareVowTools as rawPrepareVowTools } from './src/index.js';

/** @type {import('./src/types.js').IsRetryableReason} */
const isRetryableReason = (reason, priorRetryValue) => {
if (
isUpgradeDisconnection(reason) &&
(!priorRetryValue ||
(!isUpgradeDisconnection(priorRetryValue) ||
Copy link
Member

@erights erights Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of this change? (including lines 24-28 below)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When shortening a vow, or checking whether to retry a retriable flow, we will now consider that we're not eligible for a retry when:

  • The current and previous reasons were both "upgrade disconnection" records, but only if different incarnation
  • The current and previous reasons were both "vat terminated" errors

In particular the following situations are still eligible for retries:

  • Going from a "vat terminated" error to an "upgrade disconnection" record, or vice versa
  • Going from an "upgrade disconnection" record to another, if the incarnation number changes

Before this change, only "upgrade disconnection" records were recorded as "previous retry reason" so we need to narrow down the check here.

As mentioned IMO we ultimately want to implement a .cause resolution and only root checks in disconnection records, as described in #9582

Copy link

cloudflare-workers-and-pages bot commented Sep 23, 2024

Deploying agoric-sdk with  Cloudflare Pages  Cloudflare Pages

Latest commit: 0237350
Status: ✅  Deploy successful!
Preview URL: https://8f3d1453.agoric-sdk.pages.dev
Branch Preview URL: https://mhofman-9541-retriable.agoric-sdk.pages.dev

View logs

@mhofman mhofman force-pushed the mhofman/9541-retriable branch 2 times, most recently from 20417e3 to ea795cd Compare September 23, 2024 05:01
mergify bot pushed a commit that referenced this pull request Sep 24, 2024
refs: #9541

## Description
Extracted the chain hub changes from #9785 

Changed `makeChainHub` to accept a zone instead of it implicitly creating a heap zone. A durable zone is required for `retriable`.

### Security Considerations
None

### Scaling Considerations
This store the map in durable storage instead of keeping it in the heap, which uses more persistent storage.

### Documentation Considerations
Not sure if any docs need to be updated besides the JS doc for `makeChainHub`

### Testing Considerations
Updated tests and examples

### Upgrade Considerations
This moves into durable storage some info that was heap only before, effectively commiting ourselves to the shape of the data in these maps.
mergify bot added a commit that referenced this pull request Sep 28, 2024
refs: #9303

## Description
This adds a test for upgrading send-anywhere. It doesn't yet pass without these in-flight PRs,
- #9736
- #9785

We've agreed to land this without those to reduce work-in-flight.

### Security Considerations
none

### Scaling Considerations
none

### Documentation Considerations
none

### Testing Considerations
The failing test has a link to the issue to make it pass.

### Upgrade Considerations
none
@mhofman mhofman force-pushed the mhofman/9541-retriable branch 2 times, most recently from 515a4e3 to 09980df Compare October 1, 2024 08:23
@mhofman mhofman requested review from michaelfig and erights October 1, 2024 08:33
@mhofman mhofman changed the title VowTools retriable VowTools retryable Oct 1, 2024
@mhofman mhofman marked this pull request as ready for review October 1, 2024 18:03
@mhofman mhofman requested a review from a team as a code owner October 1, 2024 18:03
Copy link
Member

@michaelfig michaelfig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this implementation; it is clear and direct. I didn't come across any issues that I think would prevent merging it.

LGTM!

@mhofman mhofman force-pushed the mhofman/9541-retriable branch 2 times, most recently from 34980e5 to f38e095 Compare October 1, 2024 21:22
@mhofman mhofman added the automerge:rebase Automatically rebase updates, then merge label Oct 1, 2024
@mhofman mhofman force-pushed the mhofman/9541-retriable branch from f38e095 to 0237350 Compare October 1, 2024 21:38
@@ -100,6 +100,27 @@ export type AsPromiseFunction<
watcherArgs?: C | undefined,
) => Promise<TResult1 | TResult2>;

export interface RetryableTool {
/**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the jsdoc!


zone.exo('DurableVowTestWatcher', undefined, {
onFulfilled(value) {
t.is(value, 42, 'vow resolved with value 42');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider losing the t.plan to make the logic clearer. E.g. in the test closer define a promise kit and resolve it with the when(result). Then at the end of the test have await t.is(await resultP, 24), something like that.

Not a biggy but would make the logic easier to read and modify

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A promise kit wouldn't help because it doesn't guarantee the other handlers didn't trigger (as promise resolve logic ignores any subsequent resolution). Of course we could set a counter to check how many times the handlers got triggered, but at that point, isn't that what plan does?

Copy link
Member

@turadg turadg Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it doesn't guarantee the other handlers didn't trigger

i was thinking

Suggested change
t.is(value, 42, 'vow resolved with value 42');
if (value === 42) {
p.resolve('success');
}

But you're right that something else could resolve it with success even without success, so I suppose that t.plan() is safer

t.false(zone.isStorable(nonStorableArg), 'arg is actually non storable');

let resultV;
t.notThrows(() => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for these checks that help describe the expected behavior

@mergify mergify bot merged commit 69f8e4b into master Oct 1, 2024
79 of 80 checks passed
@mergify mergify bot deleted the mhofman/9541-retriable branch October 1, 2024 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants