Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Wasmi to v0.32.0-beta.5 #2941

Closed
wants to merge 18 commits into from
Closed

Update Wasmi to v0.32.0-beta.5 #2941

wants to merge 18 commits into from

Conversation

Robbepop
Copy link
Contributor

@Robbepop Robbepop commented Jan 16, 2024

This updates pallet-contracts to use the new experimental Wasmi v0.32.0-beta.5 with the register-machine execution model.
Subsequently this also removes usage of the deprecated FuelConsumptionMode in pallet-contracts.
Please note that Wasmi v0.32.0-beta.5 is not yet production ready!

The most significant change was the handling of wasmi::Error since Wasmi error types have been refactored to no longer use wasmi::core::Trap and instead use a model more similar to Wasmtime with a unified Error type that entails everything.

cc @athei

Checked the following:

  • cargo check -p pallet-contracts works
  • cargo test -p pallet-contracts works

- cargo check -p pallet_contracts passes
- cargo test -p pallet_contracts passes
@Robbepop Robbepop requested a review from athei as a code owner January 16, 2024 11:28
@Robbepop Robbepop requested a review from a team January 16, 2024 11:28
@Robbepop Robbepop changed the title Update Wasmi to v0.32.0 beta.5 Update Wasmi to v0.32.0-beta.5 Jan 16, 2024
Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Can you run the benchmarks in here?

@Robbepop
Copy link
Contributor Author

bot bench substrate-pallet --pallet=pallet_contracts

@command-bot
Copy link

command-bot bot commented Jan 16, 2024

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4929394 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 3-faf4006d-04ac-41b4-a811-ee4f4cc0a994 to cancel this command or bot cancel to cancel all commands in this pull request.

…=dev --target_dir=substrate --pallet=pallet_contracts
@command-bot
Copy link

command-bot bot commented Jan 16, 2024

@Robbepop Command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts has finished. Result: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4929394 has finished. If any artifacts were generated, you can download them from https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4929394/artifacts/download.

@Robbepop
Copy link
Contributor Author

@athei Shall we also enable lazy compilation and validation as it is now supported by Wasmi?

@athei
Copy link
Member

athei commented Jan 16, 2024

@athei Shall we also enable lazy compilation and validation as it is now supported by Wasmi?

Yes. But please first update the instruktion benchmark with something that cannot be optimized away by wasmi so we can get a clearer picture:

// We make the assumption that pushing a constant and dropping a value takes roughly
// the same amount of time. We call this weight `w_base`.
// The weight that would result from the respective benchmark we call: `w_bench`.
//
// w_base = w_i{32,64}const = w_drop = w_bench / 2
#[pov_mode = Ignored]
instr_i64const {
let r in 0 .. INSTR_BENCHMARK_RUNS;
let mut sbox = Sandbox::from(&WasmModule::<T>::from(ModuleDefinition {
call_body: Some(body::repeated_dyn(r, vec![
RandomI64Repeated(1),
Regular(Instruction::Drop),
])),
.. Default::default()
}));
}: {
sbox.invoke();
}

Can you also post the subweight table with the weight here?

@Robbepop
Copy link
Contributor Author

Robbepop commented Jan 16, 2024

The rendered benchmarks from the last commit (without lazy compilation enabled):

% subweight compare commits --path-pattern "substrate/frame/contracts/src/weights.rs" --method asymptotic master 4dba773f6db237a4bcf2d75b02faa88ac888447f --change changed --threshold 5
+------------------------------------------+-----------------------------------------------+---------+---------+---------------+
| File                                     | Extrinsic                                     | Old     | New     | Change [%]    |
+==============================================================================================================================+
| substrate/frame/contracts/src/weights.rs | seal_debug_message                            | 1.06ms  | 1.14ms  | +7.21  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_caller                                   | 1.33ms  | 1.42ms  | +6.29  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_input                                    | 1.24ms  | 1.32ms  | +6.25  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_own_code_hash                            | 1.48ms  | 1.56ms  | +5.77  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_block_number                             | 1.32ms  | 1.40ms  | +5.55  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_address                                  | 1.34ms  | 1.42ms  | +5.54  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_gas_left                                 | 1.39ms  | 1.46ms  | +5.51  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_now                                      | 1.33ms  | 1.40ms  | +5.40  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_hash_sha2_256                            | 1.42ms  | 1.50ms  | +5.16  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | instantiate                                   | 5.30ms  | 4.98ms  | -6.12  |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_weight_to_fee                            | 3.14ms  | 2.73ms  | -13.07 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_instantiate_per_transfer_input_salt_byte | 6.58ms  | 5.19ms  | -21.18 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_sr25519_verify                           | 9.83ms  | 7.45ms  | -24.15 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | instantiate_with_code                         | 20.43ms | 14.53ms | -28.89 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_random                                   | 4.10ms  | 2.91ms  | -29.09 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | call_with_code_per_byte                       | 5.90ms  | 3.95ms  | -32.97 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | upload_code                                   | 10.33ms | 6.92ms  | -33.02 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | seal_deposit_event                            | 6.85ms  | 4.06ms  | -40.68 |
|------------------------------------------+-----------------------------------------------+---------+---------+---------------|
| substrate/frame/contracts/src/weights.rs | instr_i64const                                | 51.14us | 2.15us  | -95.79 |
+------------------------------------------+-----------------------------------------------+---------+---------+---------------+

As image with colors:

image

Note to others: The -95% in the instr_i64const test case is indeed a 20x improvement, however, the test case is not really useful to measure the expected performance improvements since it is ridiculously simple for the new Wasmi (register) to optimize its contents away into a no-op.

@Robbepop
Copy link
Contributor Author

Yes. But please first update the instruktion benchmark with something that cannot be optimized away by wasmi so we can get a clearer picture:

@athei This needs to be done in another PR to take effect for this PR I assume?

@athei
Copy link
Member

athei commented Jan 16, 2024

It is interesting that everything regarding compilation (e.g call_with_code_per_byte) gets around 33% less expensive. I thought register machine is slightly slower to compile. But then again I don't see you activating it here. Or is register machine the default in this version of wasmi?

@Robbepop
Copy link
Contributor Author

It is interesting that everything regarding compilation (e.g call_with_code_per_byte) gets around 33% less expensive. I thought register machine is slightly slower to compile. But then again I don't see you activating it here. Or is register machine the default in this version of wasmi?

I also am a bit confused about this. The register-machine is the default in this version and is used in the benchmarks since the stack-machine got removed entirely. I refactored the Wasm translation part quite significantly, so maybe that is an explanation, although I would have not even remotely hoped for such significant improvements as a result to be honest.

@athei
Copy link
Member

athei commented Jan 16, 2024

Maybe also a case of our benchmark being too simplistic? We try to nest blocks as this is assumed to be the hardest to compile. This how how we generate a big contract and then measure the time to compile it:

/// Creates a wasm module of `target_bytes` size. Used to benchmark the performance of
/// `instantiate_with_code` for different sizes of wasm modules. The generated module maximizes
/// instrumentation runtime by nesting blocks as deeply as possible given the byte budget.
/// `code_location`: Whether to place the code into `deploy` or `call`.
pub fn sized(target_bytes: u32, code_location: Location) -> Self {
use self::elements::Instruction::{End, I32Const, If, Return};
// Base size of a contract is 63 bytes and each expansion adds 6 bytes.
// We do one expansion less to account for the code section and function body
// size fields inside the binary wasm module representation which are leb128 encoded
// and therefore grow in size when the contract grows. We are not allowed to overshoot
// because of the maximum code size that is enforced by `instantiate_with_code`.
let expansions = (target_bytes.saturating_sub(63) / 6).saturating_sub(1);
const EXPANSION: [Instruction; 4] = [I32Const(0), If(BlockType::NoResult), Return, End];
let mut module =
ModuleDefinition { memory: Some(ImportedMemory::max::<T>()), ..Default::default() };
let body = Some(body::repeated(expansions, &EXPANSION));
match code_location {
Location::Call => module.call_body = body,
Location::Deploy => module.deploy_body = body,
}
module.into()
}

@Robbepop
Copy link
Contributor Author

Maybe also a case of our benchmark being too simplistic? We try to nest blocks as this is assumed to be the hardest to compile. This how how we generate a big contract and then measure the time to compile it:

/// Creates a wasm module of `target_bytes` size. Used to benchmark the performance of
/// `instantiate_with_code` for different sizes of wasm modules. The generated module maximizes
/// instrumentation runtime by nesting blocks as deeply as possible given the byte budget.
/// `code_location`: Whether to place the code into `deploy` or `call`.
pub fn sized(target_bytes: u32, code_location: Location) -> Self {
use self::elements::Instruction::{End, I32Const, If, Return};
// Base size of a contract is 63 bytes and each expansion adds 6 bytes.
// We do one expansion less to account for the code section and function body
// size fields inside the binary wasm module representation which are leb128 encoded
// and therefore grow in size when the contract grows. We are not allowed to overshoot
// because of the maximum code size that is enforced by `instantiate_with_code`.
let expansions = (target_bytes.saturating_sub(63) / 6).saturating_sub(1);
const EXPANSION: [Instruction; 4] = [I32Const(0), If(BlockType::NoResult), Return, End];
let mut module =
ModuleDefinition { memory: Some(ImportedMemory::max::<T>()), ..Default::default() };
let body = Some(body::repeated(expansions, &EXPANSION));
match code_location {
Location::Call => module.call_body = body,
Location::Deploy => module.deploy_body = body,
}
module.into()
}

Seeing the code snippet that is generated: It uses i32.const as conditionals for if, am I seeing that correctly? If yes, then that could be an explanation as to why we see an improvement in compile times since Wasmi (register) aggressively optimizes those unreachable branches away thus, not doing anything for a significant portion of the input code.

@athei
Copy link
Member

athei commented Jan 16, 2024

Yes you seeing this correctly. The fix should be easy though: Replace the constant by a memory load. It will still immediately return at runtime but the wasmi can't optimize it away.

@Robbepop
Copy link
Contributor Author

Yes you seeing this correctly. The fix should be easy though: Replace the constant by a memory load. It will still immediately return at runtime but the wasmi can't optimize it away.

Any of global.get or local.get work too and are probably simpler to achieve.

@athei
Copy link
Member

athei commented Jan 16, 2024

Can you update that benchmark, too?

@Robbepop
Copy link
Contributor Author

Robbepop commented Jan 16, 2024

Can you update that benchmark, too?

The question that still stands is if this adjustments of benchmarks is required in a separate PR as to properly run the new or adjusted test cases on both master and the PR when comparing the results.

@athei
Copy link
Member

athei commented Jan 16, 2024

Yes I would suggest to do it in a separate PR that we merge before this one for better comparability.

@Robbepop Robbepop mentioned this pull request Jan 17, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jan 19, 2024
In #2941 we found out
that the new Wasmi (register) is very effective at optimizing away
certain benchmark bytecode constructs in a way that created an unfair
advantage over Wasmi (stack) which yielded our former benchmarks to be
ineffective at properly measuring the performance impact.

This PR adjusts both affected benchmarks to fix the stated problems.
Affected are
- `instr_i64const` -> `instr_i64add`: Renamed since it now measures the
performance impact of the Wasm `i64.add` instruction with locals as
inputs and outputs. This makes it impossible for Wasmi (register) to
aggressively optimize away the entire function body (as it previously
did) but still provides a way for Wasmi (register) to shine with its
register based execution model.
- `call_with_code_per_byte`: Now uses `local.get` instead of `i32.const`
for the `if` condition which prevents Wasmi (register) to aggressively
optimizing away whole parts of the `if` creating an unfair advantage.

cc @athei

---------

Co-authored-by: command-bot <>
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
Co-authored-by: Ignacio Palacios <ignacio.palacios.santos@gmail.com>
@Robbepop
Copy link
Contributor Author

bot bench substrate-pallet --pallet=pallet_contracts

@command-bot
Copy link

command-bot bot commented Jan 19, 2024

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4977631 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 7-fe427cd7-ea7b-4689-95bb-f351035c37b4 to cancel this command or bot cancel to cancel all commands in this pull request.

…=dev --target_dir=substrate --pallet=pallet_contracts
@command-bot

This comment was marked as outdated.

@command-bot

This comment was marked as outdated.

@Robbepop
Copy link
Contributor Author

Robbepop commented Jan 20, 2024

I think it is safe to say that the benchmarks for instr_i64_load_store are using a new Wasm engine for every benchmark thus not properly respecting the actual lazy compilation but only the execution which now does part of the compilation and thus regressing so badly. Also I have the feeling that the used test Wasm blob has exactly one function (the one under test) which additionally is bad for lazy compilation since the effectiveness of lazy compilation is because only the required parts of a Wasm module are actually used. However, this is again a worst-case and thus it probably makes sense to be used in such a benchmark.

The call_with_code_per_byte benchmark improved by roughly 4.5x.

The upload_code benchmark does not improve since we are using eager compilation for it so that it validates all of the Wasm.

Benchmark results:
image

Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense that the seal_* host function benchmarks regress. Even when using lazy compilation it will still compile on execute. Reason is that we just call the high level call function as benchmark. This will always load from storage, compile, execute. So it always benchmarks that compilation, too.

Lazy compilation even makes it worse. This is because all the code is actually executed and I assume doing it in one go is more efficient than using the lazy compilation. Even though it shouldn't make a big difference since all the code is in once function.

We need to change the benchmarks to compile the code in its setup so that when executing the contract it is only execution.

substrate/frame/contracts/src/benchmarking/sandbox.rs Outdated Show resolved Hide resolved
substrate/frame/contracts/src/wasm/prepare.rs Outdated Show resolved Hide resolved
substrate/frame/contracts/src/wasm/prepare.rs Outdated Show resolved Hide resolved
@Robbepop
Copy link
Contributor Author

bot bench substrate-pallet --pallet=pallet_contracts

@command-bot
Copy link

command-bot bot commented Jan 22, 2024

@Robbepop https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4994107 was started for your command "$PIPELINE_SCRIPTS_DIR/commands/bench/bench.sh" --subcommand=pallet --runtime=dev --target_dir=substrate --pallet=pallet_contracts. Check out https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/pipelines?page=1&scope=all&username=group_605_bot to know what else is being executed currently.

Comment bot cancel 2-3092f491-d224-42ba-aeff-4c3e7e8cbbc2 to cancel this command or bot cancel to cancel all commands in this pull request.

Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Only thing is the regression in weight for the host functions because we (wrongly) been including the compilation time into their benchmarks.

So I think we need two PRs merged to master before progressing with this one:

  1. Removal of the double compilation in the other comment.
  2. Rework of the host function benchmarks to only measure execution time.

substrate/frame/contracts/src/wasm/prepare.rs Outdated Show resolved Hide resolved
substrate/frame/contracts/src/wasm/prepare.rs Outdated Show resolved Hide resolved
…=dev --target_dir=substrate --pallet=pallet_contracts
@command-bot

This comment was marked as outdated.

@Robbepop
Copy link
Contributor Author

subweight compare commits --path-pattern "substrate/frame/contracts/src/weights.rs" --method asymptotic master d2b8896df4b77d3cd06e32ba72fad5c217620c15 --change changed --threshold 10

image

As mandated by code review. However, this shouldn't change any benchmark results since it is just setup code.
@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable 3/3
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4997443

@Robbepop
Copy link
Contributor Author

The CI pipeline was cancelled due to failure one of the required jobs. Job name: test-linux-stable 3/3 Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/4997443

The test probably failed because of the removal of FuelConsumptionMode in wasmi which now makes it more likely that required fuel and consumed fuel are actually equal. The fix is to simply correct the assert condition to allow for both values to be equal.

Copy link
Member

@athei athei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to find out why those changes broke the test.

@@ -3078,7 +3078,7 @@ fn gas_estimation_call_runtime() {
// contract encodes the result of the dispatch runtime
let outcome = u32::decode(&mut result.result.unwrap().data.as_ref()).unwrap();
assert_eq!(outcome, 0);
assert!(result.gas_required.ref_time() > result.gas_consumed.ref_time());
assert!(result.gas_required.ref_time() >= result.gas_consumed.ref_time());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is not incorrect. The comment a bit further up explains this. It is the point of the test to make sure that the pre-charging works. In this case when calling into the runtime. We use a function that has a much bigger max weight than actual weight to exercise that.

Why this broke from your changes is indeed strange.

@Robbepop Robbepop closed this by deleting the head repository Feb 12, 2024
@pgherveou
Copy link
Contributor

We need to change the benchmarks to compile the code in its setup so that when executing the contract it is only execution.

Is that an issue ? We compute the schedule by doing bench(1) - bench(0) between these two iterations the weight of the compilation phase should be roughly the same (slightly bigger for bench(1)) but that should not have a big impact on the Schedule.

We should probably look at the Schedule result instead

bgallois pushed a commit to duniter/duniter-polkadot-sdk that referenced this pull request Mar 25, 2024
In paritytech#2941 we found out
that the new Wasmi (register) is very effective at optimizing away
certain benchmark bytecode constructs in a way that created an unfair
advantage over Wasmi (stack) which yielded our former benchmarks to be
ineffective at properly measuring the performance impact.

This PR adjusts both affected benchmarks to fix the stated problems.
Affected are
- `instr_i64const` -> `instr_i64add`: Renamed since it now measures the
performance impact of the Wasm `i64.add` instruction with locals as
inputs and outputs. This makes it impossible for Wasmi (register) to
aggressively optimize away the entire function body (as it previously
did) but still provides a way for Wasmi (register) to shine with its
register based execution model.
- `call_with_code_per_byte`: Now uses `local.get` instead of `i32.const`
for the `if` condition which prevents Wasmi (register) to aggressively
optimizing away whole parts of the `if` creating an unfair advantage.

cc @athei

---------

Co-authored-by: command-bot <>
Co-authored-by: Alexander Theißen <alex.theissen@me.com>
Co-authored-by: Ignacio Palacios <ignacio.palacios.santos@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants