feat: add runtime-wide `try-state` hooks #4631

ntn-x2 · 2024-05-29T13:26:04Z

Fixes #235.

I tried different approaches, including the suggestion of adding a fn id() -> String; function on the trait TryState<BlockNumber> trait, but that was giving me issues when trying to derive TryState for tuples (what is the id() of a tuple of elements such as AllPalletsWithSystem)? Also, it feels less than ideal to me for a trait to implement both logic that can take a Select::Only parameter (on which you should apply filtering logic, that's why it's ignored in 100% of the cases that I found) and also returns an id() -> String in there.

So, I am proposing the following approach:

Executive takes a new generic with a default () implementation for custom TryState hooks, as done for the OnRuntimeUpgrade ones. Like the OnRuntimeUpgrade hooks, TryState hooks are executed before pallet-specific hooks.
I introduce two new traits, trait TryStateLogic<BlockNumber> which is responsible for the try-state logic itself, and the trait IdentifiableTryStateLogic<BlockNumber>: TryStateLogic<BlockNumber>, which returns a function that is used to match the specific test with one of the provided identifiers in TryStateSelect::Only(names). The two traits could be collapsed into one, if need be, as long as they expose the two functions declared here.
By default, the pallet macro also implements the two new traits, implementing TryStateLogic<BlockNumber> re-using the Hooks<BlockNumber logic for try_state() and IdentifiableTryStateLogic<BlockNumber> re-using the PalletInfoAccess logic for name().

Open points

I did not find a way to implement a trait for a variable number of tuples without the requirement for the elements of the tuple to also implement the trait, so I implement the TryState trait for the empty tuple (), the (T,) tuple and the (T1, T2) tuple, which covers the (CustomHook, AllPallets*) case. We might want to have a macro where the whole where clause is configurable, not only additional traits. But it would happen in a different PR.
I augmented the Executive unit tests to test that the TryState hooks are called in the right order. Because TryState hooks are not supposed to modify storage, I could not use the same approach used for OnRuntimeUpgrade hooks. Hence, I used a static variable that is reset to false with a new test ext, and set to true when the test custom TryState is invoked. I did not get any errors in the tests, but it could be that this leads to concurrency issues, in which case an alternative strategy is needed, perhaps including an additional boolean flag in what is returned by new_test_ext?

liamaharon · 2024-05-30T09:00:33Z

bot fmt

command-bot · 2024-05-30T09:00:39Z

@liamaharon https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh". Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 5-b73d9b63-067a-468a-a635-ea1f610d2c88 to cancel this command or bot cancel to cancel all commands in this pull request.

command-bot · 2024-05-30T09:01:39Z

@liamaharon Command "$PIPELINE_SCRIPTS_DIR/commands/fmt/fmt.sh" has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6348179/artifacts/download.

ntn-x2 · 2024-06-10T12:56:05Z

@liamaharon can I get a review from anyone? The PR has now been hanging for quite a while with 0 interactions with anyone from Parity.

ntn-x2 · 2024-07-12T09:13:55Z

@liamaharon can I get anyone to review this PR? In particular, I am faced with a problem: the try-runtime hooks are not supposed to make any storage changes. Hence, I did not find out any other way than using a static mut global variable, which is by itself very bad. Do you have any other way in mind to test the right order of hooks being called?

ggwpez · 2024-07-12T09:15:26Z

I will try to take a look today. Liam is currently offboarding 😢

ggwpez

Im not sure if i completely get the architecture here.
Can you please briefly put the new traits into the MR description maybe and what they should do? A graphic can also help, but maybe not worth it if we have to change it.

substrate/frame/executive/src/lib.rs

substrate/frame/support/src/traits/try_runtime/mod.rs

…tytech#4621) Follow-up to the new `XcmDryRunApi` runtime API introduced in paritytech#3872. Taking an extrinsic means the frontend has to sign first to dry-run and once again to submit. This is bad UX which is solved by taking an `origin` and a `call`. This also has the benefit of being able to dry-run as any account, since it needs no signature. This is a breaking change since I changed `dry_run_extrinsic` to `dry_run_call`, however, this API is still only on testnets. The crates are bumped accordingly. As a part of this PR, I changed the name of the API from `XcmDryRunApi` to just `DryRunApi`, since it can be used for general dry-running :) Step towards paritytech#690. Example of calling the API with PAPI, not the best code, just testing :) ```ts // We just build a call, the arguments make it look very big though. const call = localApi.tx.XcmPallet.transfer_assets({ dest: XcmVersionedLocation.V4({ parents: 0, interior: XcmV4Junctions.X1(XcmV4Junction.Parachain(1000)) }), beneficiary: XcmVersionedLocation.V4({ parents: 0, interior: XcmV4Junctions.X1(XcmV4Junction.AccountId32({ network: undefined, id: Binary.fromBytes(encodeAccount(account.address)) })) }), weight_limit: XcmV3WeightLimit.Unlimited(), assets: XcmVersionedAssets.V4([{ id: { parents: 0, interior: XcmV4Junctions.Here() }, fun: XcmV3MultiassetFungibility.Fungible(1_000_000_000_000n) } ]), fee_asset_item: 0, }); // We call the API passing in a signed origin const result = await localApi.apis.XcmDryRunApi.dry_run_call( WestendRuntimeOriginCaller.system(DispatchRawOrigin.Signed(account.address)), call.decodedCall ); if (result.success && result.value.execution_result.success) { // We find the forwarded XCM we want. The first one going to AssetHub in this case. const xcmsToAssetHub = result.value.forwarded_xcms.find(([location, _]) => ( location.type === "V4" && location.value.parents === 0 && location.value.interior.type === "X1" && location.value.interior.value.type === "Parachain" && location.value.interior.value.value === 1000 ))!; // We can even find the delivery fees for that forwarded XCM. const deliveryFeesQuery = await localApi.apis.XcmPaymentApi.query_delivery_fees(xcmsToAssetHub[0], xcmsToAssetHub[1][0]); if (deliveryFeesQuery.success) { const amount = deliveryFeesQuery.value.type === "V4" && deliveryFeesQuery.value.value[0].fun.type === "Fungible" && deliveryFeesQuery.value.value[0].fun.value.valueOf() || 0n; // We store them in state somewhere. setDeliveryFees(formatAmount(BigInt(amount))); } } ``` --------- Co-authored-by: Bastian Köcher <git@kchr.de>

ntn-x2 · 2024-07-13T10:18:36Z

@ggwpez I actually re-checked the description I gave to the PR, and I am not sure what is unclear about it. Could you please tell me what's the part that you think should be improved?

I guess what I did not highlight strongly enough, probably, is that this PR does not aim to be a overhaul of the try-runtime functionality, but just a minimal changeset to make it work for runtime-wide tests. As @kianenigma mentioned in the issue this PR aims to solve, there seems to be a discussion somewhere within Parity about a potential overhaul of the try-runtime system, and this PR does not aim to do that. If some code maintainers believe that even the renaming of the trait and the introduction of a new trait that does not require pallets to have anything to do with the select argument, then I could even revert that change and simply introduce the IdentifiableTryStateLogic trait that allows us to filter try-state logic based on the list of names provided in the CLI.

ggwpez · 2024-07-14T23:36:23Z

substrate/frame/executive/src/tests.rs

+			unsafe {
+				CANARY_FLAG = true;


CANARY_FLAG::set(true);

in combination with parameter_types! pub static please. Cell would also work, but normally we use parameter types.

Addressed in a87c594.

ggwpez · 2024-07-14T23:46:23Z

substrate/frame/support/src/traits/try_runtime/mod.rs

+	for_tuples!( where #( Tuple: TryStateLogic<BlockNumber> )* );
+	fn matches_id(id: &[u8]) -> bool {
+		for_tuples!( #( if Tuple::matches_id(id) { return true; } )* );
+		return false;


That means that it returns true when one of the tuple elements returns true? But what if we only want to run that single tuple element instead of the whole tuple?

I think i am missing some critical info about this merge request, which is why i dont get what it was done this way.
Is it not enough to just add a TryState to the end of Executive so that it would look like this?

pub struct Executive< System, Block, Context, UnsignedValidator, AllPalletsWithSystem, OnRuntimeUpgrade = (), RuntimeTryState = (), >(

Perhaps the context in #235 could help you understand better?

The goal of this PR is to achieve exactly what you are suggesting here. Having a generic to run try-state checks at the runtime level. The problem with the current structure (as also mentioned in the linked issue), is that the implementation on tuples is coupled with the PalletInfoAccess trait, as you can see here:

polkadot-sdk/substrate/frame/support/src/traits/try_runtime/mod.rs

Line 159 in 4aa29a4

for_tuples!( where #( Tuple: crate::traits::PalletInfoAccess )* );

This means that in order to use tuple elements that are not pallet, such as runtime-wide logic, the coupling to such a trait must be removed. And because of some requirements the impl_for_tuples has about knowing the type and size of the types involved, it was not possible to use anything else than a function that returns true if the specified identifier matches what was provided by the user in the try-state command.

So now if you have a tuple of, for instance, (CustomLogic, AllPalletsWithSystem), the matching function will return true if CustomLogic matches, or if any pallet has the specified ID. It is assumed that no two components, be they pallets or else, have the same ID, as that would mean the latter of them would never be called.

And this goes beyond the AllPalletsWithSystem usage. If even just CustomLogic is itself a tuple, then with the current design it is required to implement PalletInfoAccess, which of course makes no sense for anything else than a pallet.

The goal of this PR is to achieve exactly what you are suggesting here. Having a generic to run try-state checks at the runtime level

Then the easiest way is to just make a new trait, since the TryState trait is aimed at pallets (should probably have been named PalletTryState, but now we are stuck with this for compatibility reasons).

pub trait RuntimeTryState { fn try_state(whatever...) -> Result<(), TryRuntimeError>; }

Anyway, i dont really want to block this approach if it has clear advantages over ^, so going to ask @kianenigma what he thinks.

Yes, let's wait for Kian's input, if it comes in a reasonable time 😇
As usual, I do not honestly care about HOW this feature is introduced, as long as it becomes possible. I thought that introducing a minimal set of changes would be the preferred solution. Having a new trait would then means that the AllPalletsWithSystem implements one trait, and the new generic would implement a different trait, and I tend not to like this idea 😁 Also, I don't understand why a pallet is required to make judgments about the TryState::Select parameter, which is always ignored anyway in implementations, when all they care is, at most, the BlockNumber. That design flaw I tried to address here, by removing the second argument for pallets implementations, and leave it only inside the executive try-runtime features, which is the only place where any logic should look at the logic to filter which tests to run. But this conversation would probably be out of scope for this PR, which aims to introduce the feature breaking as little things as possible. Also because, as Kian mentioned, there might have been already talks as to how overhaul the try-runtime stuff altogether.

We've had a lot of iterations here, thank you both, esp @ntn-x2 for exploring solutions with us.

The systemic mistake I am realizing (noted for future) is that we should have first discussed the "ideal outcome" vs the approach. So let's do that.

what should we expect of TryStateSelect in the context of this PR? Should we be able to tweak RuntimeTryState with it?

If yes, the current approach is the right way. If not, Oliver's approach is much simpler and RuntimeTryState will always be executed.

I am leaning toward not allowing runtime level try-state logic to be configurable with TryStateSelect.

First reason: This API, as pointed out here, is already somewhat broken, and we are not even sure if it is "enough". In that, the point of it was to regulate how much execution time is given to try-state hooks. But, through experience, we have seen that the issue is that a single pallet has a try-state-bomb that takes forever. In an ideal sense, instead of TryStateSelect you would pass some Weight-equivalent amount, and this much weight was used in try-state. With PVM we can do this.

Second reason: If we allow TryStateSelect to properly configure runtime-try-state, the variants would be a lot. If you want "round-robin pallets + runtime", or "pallets x, y, z and runtime". These are hard to express.

Third Reason: Let's do the simpler approach. IFF someone needs to disable runtime level try-state, we have a blueprint of how to solve it.

But, through experience, we have seen that the issue is that a single pallet has a try-state-bomb that takes forever. In an ideal sense, instead of TryStateSelect you would pass some Weight-equivalent amount, and this much weight was used in try-state. With PVM we can do this.

PVM will not come tomorrow and I also don't see why we should need this here. These try-state hooks are not being run as part of the on chain logic. How long they take is really not important at all. We need to ensure that they finish on time. If there is some kind of "try-state bomb", it probably means that we read too much state. Still the amount of state is not in the terabytes or similar, thus I would say that our current read performance needs to be optimized.

@kianenigma sorry I think I did not fully understand what your suggestion here is. Are you suggesting to have a new trait which takes only a block number and that is required to be implemented by the new generic? That would mean that the TryStateSelect logic would only be applied to pallets, while all other runtime try-states would all always be executed. Is that the proposed way forward? Would the runtime logic still be called before the AllPallets logic?

paritytech-cicd-pr · 2024-07-15T09:58:19Z

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 3/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6697932

ntn-x2 requested a review from a team as a code owner May 29, 2024 13:26

ntn-x2 mentioned this pull request May 29, 2024

Allow runtime-wide TryState checks #235

Open

ntn-x2 force-pushed the aa/executive-try-state branch from 9f658e1 to 26158c9 Compare May 30, 2024 06:38

ntn-x2 requested a review from a team as a code owner May 30, 2024 06:56

ntn-x2 force-pushed the aa/executive-try-state branch from 13c39a5 to 1551382 Compare June 10, 2024 12:55

ntn-x2 force-pushed the aa/executive-try-state branch 2 times, most recently from 3fbb357 to 47a0b80 Compare June 13, 2024 11:58

ntn-x2 force-pushed the aa/executive-try-state branch from 47a0b80 to 5df3125 Compare July 12, 2024 09:12

ggwpez reviewed Jul 12, 2024

View reviewed changes

substrate/frame/executive/src/lib.rs Show resolved Hide resolved

substrate/frame/executive/src/lib.rs Show resolved Hide resolved

substrate/frame/support/src/traits/try_runtime/mod.rs Show resolved Hide resolved

franciscoaguirre and others added 15 commits July 13, 2024 12:16

Add new generic to Executive

53d131c

frame-executive code compiling

85b439a

frame-support compiling

da5e902

frame-executive compiling

871f87a

frame-support compiling

b962d31

frame-executive compiling as well

4760e8f

Add comment

4ddee80

Rename generics

5401db6

Final cleanup before tests

44a545a

Add tests for executive try-state

a8aad9e

Last cleanups

82a4100

Fix CI issues

4acd366

Fix docs

7160595

fmt

10da08b

ntn-x2 added 2 commits July 13, 2024 12:16

Post-rebase fixes

2f69a01

Address first PR comment

56e91c7

ntn-x2 force-pushed the aa/executive-try-state branch from 5df3125 to 56e91c7 Compare July 13, 2024 10:17

paritytech-review-bot bot requested a review from a team July 13, 2024 10:17

ggwpez reviewed Jul 14, 2024

View reviewed changes

Replace global static mut with static parameter_types!

a87c594

paritytech-review-bot bot requested a review from a team July 15, 2024 07:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add runtime-wide `try-state` hooks #4631

feat: add runtime-wide `try-state` hooks #4631

ntn-x2 commented May 29, 2024 •

edited

Loading

liamaharon commented May 30, 2024

command-bot bot commented May 30, 2024 •

edited

Loading

command-bot bot commented May 30, 2024

ntn-x2 commented Jun 10, 2024

ntn-x2 commented Jul 12, 2024 •

edited

Loading

ggwpez commented Jul 12, 2024

ggwpez left a comment

ntn-x2 commented Jul 13, 2024 •

edited

Loading

ggwpez Jul 14, 2024

ntn-x2 Jul 15, 2024

ggwpez Jul 14, 2024 •

edited

Loading

ntn-x2 Jul 15, 2024 •

edited

Loading

ntn-x2 Jul 15, 2024

ggwpez Jul 15, 2024 •

edited

Loading

ntn-x2 Jul 15, 2024

kianenigma Jul 15, 2024

bkchr Jul 15, 2024

ntn-x2 Jul 16, 2024

paritytech-cicd-pr commented Jul 15, 2024

feat: add runtime-wide try-state hooks #4631

Are you sure you want to change the base?

feat: add runtime-wide try-state hooks #4631

Conversation

ntn-x2 commented May 29, 2024 • edited Loading

Open points

liamaharon commented May 30, 2024

command-bot bot commented May 30, 2024 • edited Loading

command-bot bot commented May 30, 2024

ntn-x2 commented Jun 10, 2024

ntn-x2 commented Jul 12, 2024 • edited Loading

ggwpez commented Jul 12, 2024

ggwpez left a comment

Choose a reason for hiding this comment

ntn-x2 commented Jul 13, 2024 • edited Loading

ggwpez Jul 14, 2024

Choose a reason for hiding this comment

ntn-x2 Jul 15, 2024

Choose a reason for hiding this comment

ggwpez Jul 14, 2024 • edited Loading

Choose a reason for hiding this comment

ntn-x2 Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

ntn-x2 Jul 15, 2024

Choose a reason for hiding this comment

ggwpez Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

ntn-x2 Jul 15, 2024

Choose a reason for hiding this comment

kianenigma Jul 15, 2024

Choose a reason for hiding this comment

bkchr Jul 15, 2024

Choose a reason for hiding this comment

ntn-x2 Jul 16, 2024

Choose a reason for hiding this comment

paritytech-cicd-pr commented Jul 15, 2024

feat: add runtime-wide `try-state` hooks #4631

feat: add runtime-wide `try-state` hooks #4631

ntn-x2 commented May 29, 2024 •

edited

Loading

command-bot bot commented May 30, 2024 •

edited

Loading

ntn-x2 commented Jul 12, 2024 •

edited

Loading

ntn-x2 commented Jul 13, 2024 •

edited

Loading

ggwpez Jul 14, 2024 •

edited

Loading

ntn-x2 Jul 15, 2024 •

edited

Loading

ggwpez Jul 15, 2024 •

edited

Loading