Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add runtime-wide try-state hooks #4631

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

ntn-x2
Copy link

@ntn-x2 ntn-x2 commented May 29, 2024

Fixes #235.

I tried different approaches, including the suggestion of adding a fn id() -> String; function on the trait TryState<BlockNumber> trait, but that was giving me issues when trying to derive TryState for tuples (what is the id() of a tuple of elements such as AllPalletsWithSystem)? Also, it feels less than ideal to me for a trait to implement both logic that can take a Select::Only parameter (on which you should apply filtering logic, that's why it's ignored in 100% of the cases that I found) and also returns an id() -> String in there.

So, I am proposing the following approach:

  • Executive takes a new generic with a default () implementation for custom TryState hooks, as done for the OnRuntimeUpgrade ones. Like the OnRuntimeUpgrade hooks, TryState hooks are executed before pallet-specific hooks.
  • I introduce two new traits, trait TryStateLogic<BlockNumber> which is responsible for the try-state logic itself, and the trait IdentifiableTryStateLogic<BlockNumber>: TryStateLogic<BlockNumber>, which returns a function that is used to match the specific test with one of the provided identifiers in TryStateSelect::Only(names). The two traits could be collapsed into one, if need be, as long as they expose the two functions declared here.
  • By default, the pallet macro also implements the two new traits, implementing TryStateLogic<BlockNumber> re-using the Hooks<BlockNumber logic for try_state() and IdentifiableTryStateLogic<BlockNumber> re-using the PalletInfoAccess logic for name().

Open points

  • I did not find a way to implement a trait for a variable number of tuples without the requirement for the elements of the tuple to also implement the trait, so I implement the TryState trait for the empty tuple (), the (T,) tuple and the (T1, T2) tuple, which covers the (CustomHook, AllPallets*) case. We might want to have a macro where the whole where clause is configurable, not only additional traits. But it would happen in a different PR.
  • I augmented the Executive unit tests to test that the TryState hooks are called in the right order. Because TryState hooks are not supposed to modify storage, I could not use the same approach used for OnRuntimeUpgrade hooks. Hence, I used a static variable that is reset to false with a new test ext, and set to true when the test custom TryState is invoked. I did not get any errors in the tests, but it could be that this leads to concurrency issues, in which case an alternative strategy is needed, perhaps including an additional boolean flag in what is returned by new_test_ext?

@ntn-x2 ntn-x2 requested a review from a team as a code owner May 29, 2024 13:26
@ntn-x2 ntn-x2 force-pushed the aa/executive-try-state branch from 9f658e1 to 26158c9 Compare May 30, 2024 06:38
@ntn-x2 ntn-x2 requested a review from a team as a code owner May 30, 2024 06:56
@liamaharon
Copy link
Contributor

bot fmt

@command-bot
Copy link

command-bot bot commented May 30, 2024

@liamaharon https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 5-b73d9b63-067a-468a-a635-ea1f610d2c88 to cancel this command or bot cancel to cancel all commands in this pull request.

@command-bot
Copy link

command-bot bot commented May 30, 2024

@liamaharon Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179/artifacts/download.

@ntn-x2 ntn-x2 force-pushed the aa/executive-try-state branch from 13c39a5 to 1551382 Compare June 10, 2024 12:55
@ntn-x2
Copy link
Author

ntn-x2 commented Jun 10, 2024

@liamaharon can I get a review from anyone? The PR has now been hanging for quite a while with 0 interactions with anyone from Parity.

@ntn-x2 ntn-x2 force-pushed the aa/executive-try-state branch 2 times, most recently from 3fbb357 to 47a0b80 Compare June 13, 2024 11:58
@ntn-x2 ntn-x2 force-pushed the aa/executive-try-state branch from 47a0b80 to 5df3125 Compare July 12, 2024 09:12
@ntn-x2
Copy link
Author

ntn-x2 commented Jul 12, 2024

@liamaharon can I get anyone to review this PR? In particular, I am faced with a problem: the try-runtime hooks are not supposed to make any storage changes. Hence, I did not find out any other way than using a static mut global variable, which is by itself very bad. Do you have any other way in mind to test the right order of hooks being called?

@ggwpez
Copy link
Member

ggwpez commented Jul 12, 2024

I will try to take a look today. Liam is currently offboarding 😢

Copy link
Member

@ggwpez ggwpez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure if i completely get the architecture here.
Can you please briefly put the new traits into the MR description maybe and what they should do? A graphic can also help, but maybe not worth it if we have to change it.

substrate/frame/executive/src/lib.rs Show resolved Hide resolved
substrate/frame/executive/src/lib.rs Show resolved Hide resolved
franciscoaguirre and others added 15 commits July 13, 2024 12:16
…tytech#4621)

Follow-up to the new `XcmDryRunApi` runtime API introduced in
paritytech#3872.

Taking an extrinsic means the frontend has to sign first to dry-run and
once again to submit.
This is bad UX which is solved by taking an `origin` and a `call`.
This also has the benefit of being able to dry-run as any account, since
it needs no signature.

This is a breaking change since I changed `dry_run_extrinsic` to
`dry_run_call`, however, this API is still only on testnets.
The crates are bumped accordingly.

As a part of this PR, I changed the name of the API from `XcmDryRunApi`
to just `DryRunApi`, since it can be used for general dry-running :)

Step towards paritytech#690.

Example of calling the API with PAPI, not the best code, just testing :)

```ts
// We just build a call, the arguments make it look very big though.
const call = localApi.tx.XcmPallet.transfer_assets({
  dest: XcmVersionedLocation.V4({ parents: 0, interior: XcmV4Junctions.X1(XcmV4Junction.Parachain(1000)) }),
  beneficiary: XcmVersionedLocation.V4({ parents: 0, interior: XcmV4Junctions.X1(XcmV4Junction.AccountId32({ network: undefined, id: Binary.fromBytes(encodeAccount(account.address)) })) }),
  weight_limit: XcmV3WeightLimit.Unlimited(),
  assets: XcmVersionedAssets.V4([{
    id: { parents: 0, interior: XcmV4Junctions.Here() },
    fun: XcmV3MultiassetFungibility.Fungible(1_000_000_000_000n) }
  ]),
  fee_asset_item: 0,
});
// We call the API passing in a signed origin 
const result = await localApi.apis.XcmDryRunApi.dry_run_call(
  WestendRuntimeOriginCaller.system(DispatchRawOrigin.Signed(account.address)),
  call.decodedCall
);
if (result.success && result.value.execution_result.success) {
  // We find the forwarded XCM we want. The first one going to AssetHub in this case.
  const xcmsToAssetHub = result.value.forwarded_xcms.find(([location, _]) => (
    location.type === "V4" &&
      location.value.parents === 0 &&
      location.value.interior.type === "X1"
      && location.value.interior.value.type === "Parachain"
      && location.value.interior.value.value === 1000
  ))!;

  // We can even find the delivery fees for that forwarded XCM.
  const deliveryFeesQuery = await localApi.apis.XcmPaymentApi.query_delivery_fees(xcmsToAssetHub[0], xcmsToAssetHub[1][0]);

  if (deliveryFeesQuery.success) {
    const amount = deliveryFeesQuery.value.type === "V4" && deliveryFeesQuery.value.value[0].fun.type === "Fungible" && deliveryFeesQuery.value.value[0].fun.value.valueOf() || 0n;
    // We store them in state somewhere.
    setDeliveryFees(formatAmount(BigInt(amount)));
  }
}
```

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
@ntn-x2 ntn-x2 force-pushed the aa/executive-try-state branch from 5df3125 to 56e91c7 Compare July 13, 2024 10:17
@paritytech-review-bot paritytech-review-bot bot requested a review from a team July 13, 2024 10:17
@ntn-x2
Copy link
Author

ntn-x2 commented Jul 13, 2024

@ggwpez I actually re-checked the description I gave to the PR, and I am not sure what is unclear about it. Could you please tell me what's the part that you think should be improved?

I guess what I did not highlight strongly enough, probably, is that this PR does not aim to be a overhaul of the try-runtime functionality, but just a minimal changeset to make it work for runtime-wide tests. As @kianenigma mentioned in the issue this PR aims to solve, there seems to be a discussion somewhere within Parity about a potential overhaul of the try-runtime system, and this PR does not aim to do that. If some code maintainers believe that even the renaming of the trait and the introduction of a new trait that does not require pallets to have anything to do with the select argument, then I could even revert that change and simply introduce the IdentifiableTryStateLogic trait that allows us to filter try-state logic based on the list of names provided in the CLI.

Comment on lines 64 to 65
unsafe {
CANARY_FLAG = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CANARY_FLAG::set(true);

in combination with parameter_types! pub static please. Cell would also work, but normally we use parameter types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in a87c594.

for_tuples!( where #( Tuple: TryStateLogic<BlockNumber> )* );
fn matches_id(id: &[u8]) -> bool {
for_tuples!( #( if Tuple::matches_id(id) { return true; } )* );
return false;
Copy link
Member

@ggwpez ggwpez Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means that it returns true when one of the tuple elements returns true? But what if we only want to run that single tuple element instead of the whole tuple?

I think i am missing some critical info about this merge request, which is why i dont get what it was done this way.
Is it not enough to just add a TryState to the end of Executive so that it would look like this?

pub struct Executive<
	System,
	Block,
	Context,
	UnsignedValidator,
	AllPalletsWithSystem,
	OnRuntimeUpgrade = (),
	RuntimeTryState = (),
>(

Copy link
Author

@ntn-x2 ntn-x2 Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the context in #235 could help you understand better?

The goal of this PR is to achieve exactly what you are suggesting here. Having a generic to run try-state checks at the runtime level. The problem with the current structure (as also mentioned in the linked issue), is that the implementation on tuples is coupled with the PalletInfoAccess trait, as you can see here:

for_tuples!( where #( Tuple: crate::traits::PalletInfoAccess )* );

This means that in order to use tuple elements that are not pallet, such as runtime-wide logic, the coupling to such a trait must be removed. And because of some requirements the impl_for_tuples has about knowing the type and size of the types involved, it was not possible to use anything else than a function that returns true if the specified identifier matches what was provided by the user in the try-state command.

So now if you have a tuple of, for instance, (CustomLogic, AllPalletsWithSystem), the matching function will return true if CustomLogic matches, or if any pallet has the specified ID. It is assumed that no two components, be they pallets or else, have the same ID, as that would mean the latter of them would never be called.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this goes beyond the AllPalletsWithSystem usage. If even just CustomLogic is itself a tuple, then with the current design it is required to implement PalletInfoAccess, which of course makes no sense for anything else than a pallet.

Copy link
Member

@ggwpez ggwpez Jul 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this PR is to achieve exactly what you are suggesting here. Having a generic to run try-state checks at the runtime level

Then the easiest way is to just make a new trait, since the TryState trait is aimed at pallets (should probably have been named PalletTryState, but now we are stuck with this for compatibility reasons).

pub trait RuntimeTryState {
    fn try_state(whatever...) -> Result<(), TryRuntimeError>;
}

Anyway, i dont really want to block this approach if it has clear advantages over ^, so going to ask @kianenigma what he thinks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's wait for Kian's input, if it comes in a reasonable time 😇
As usual, I do not honestly care about HOW this feature is introduced, as long as it becomes possible. I thought that introducing a minimal set of changes would be the preferred solution. Having a new trait would then means that the AllPalletsWithSystem implements one trait, and the new generic would implement a different trait, and I tend not to like this idea 😁 Also, I don't understand why a pallet is required to make judgments about the TryState::Select parameter, which is always ignored anyway in implementations, when all they care is, at most, the BlockNumber. That design flaw I tried to address here, by removing the second argument for pallets implementations, and leave it only inside the executive try-runtime features, which is the only place where any logic should look at the logic to filter which tests to run. But this conversation would probably be out of scope for this PR, which aims to introduce the feature breaking as little things as possible. Also because, as Kian mentioned, there might have been already talks as to how overhaul the try-runtime stuff altogether.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've had a lot of iterations here, thank you both, esp @ntn-x2 for exploring solutions with us.

The systemic mistake I am realizing (noted for future) is that we should have first discussed the "ideal outcome" vs the approach. So let's do that.

what should we expect of TryStateSelect in the context of this PR? Should we be able to tweak RuntimeTryState with it?

If yes, the current approach is the right way. If not, Oliver's approach is much simpler and RuntimeTryState will always be executed.

I am leaning toward not allowing runtime level try-state logic to be configurable with TryStateSelect.

First reason: This API, as pointed out here, is already somewhat broken, and we are not even sure if it is "enough". In that, the point of it was to regulate how much execution time is given to try-state hooks. But, through experience, we have seen that the issue is that a single pallet has a try-state-bomb that takes forever. In an ideal sense, instead of TryStateSelect you would pass some Weight-equivalent amount, and this much weight was used in try-state. With PVM we can do this.

Second reason: If we allow TryStateSelect to properly configure runtime-try-state, the variants would be a lot. If you want "round-robin pallets + runtime", or "pallets x, y, z and runtime". These are hard to express.

Third Reason: Let's do the simpler approach. IFF someone needs to disable runtime level try-state, we have a blueprint of how to solve it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, through experience, we have seen that the issue is that a single pallet has a try-state-bomb that takes forever. In an ideal sense, instead of TryStateSelect you would pass some Weight-equivalent amount, and this much weight was used in try-state. With PVM we can do this.

PVM will not come tomorrow and I also don't see why we should need this here. These try-state hooks are not being run as part of the on chain logic. How long they take is really not important at all. We need to ensure that they finish on time. If there is some kind of "try-state bomb", it probably means that we read too much state. Still the amount of state is not in the terabytes or similar, thus I would say that our current read performance needs to be optimized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kianenigma sorry I think I did not fully understand what your suggestion here is. Are you suggesting to have a new trait which takes only a block number and that is required to be implemented by the new generic? That would mean that the TryStateSelect logic would only be applied to pallets, while all other runtime try-states would all always be executed. Is that the proposed way forward? Would the runtime logic still be called before the AllPallets logic?

@paritytech-review-bot paritytech-review-bot bot requested a review from a team July 15, 2024 07:43
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 3/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6697932

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow runtime-wide TryState checks
7 participants