-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/cmd followups #5533
/cmd followups #5533
Conversation
…t pallet_balances --clean'
…t pallet_balances'
…t pallet_balances'
65b01da
to
d7b2ffa
Compare
310d7ad
to
9f28804
Compare
/cmd bench --runtime asset-hub-westend --pallet pallet_balances --continue-on-fail --clean |
Command "bench --runtime asset-hub-westend --pallet pallet_balances --continue-on-fail --clean" has started 🚀 See logs here |
Command "bench --runtime asset-hub-westend --pallet pallet_balances --continue-on-fail --clean" has finished ✅ See logs here Subweight results:
|
7804628
to
9f28804
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new to this code, so please take this approval with that in mind.
maybe worth having @ggwpez eyes on this |
How did it happen that this PR changed benchmarked weights for the From what I see in the output generated by the benchmark, I'm not even sure it was run on the reference hardware. |
1.as for reference or not - it was generated using a new github runner (vs gitlab runner) - And to test old vs new I think we can run the same pallet somewhere via old and new command bot
Let's see here #6095 |
@s0me0ne-unkn0wn you're right, looks like there's a consistent and significant difference, thanks for flagging! frame-omni-bencher + github runner (parity-weights) -> cbb16b4 @alvicsam could you please check and share more details what could lead to that consistent difference between the 2 runners? |
The hardware is the same: |
We could also add run |
I tested GH runner - frame-omni-bencer (logs vs GH runner old cli ( I added and run GitHub Selfhosted runner VM -> Logs GitLab runner VM -> Pipeline Logs So we chatted about that with @alvicsam and we think that it might worth just recalculating the new baselines with new VMs wdyt @ggwpez @s0me0ne-unkn0wn ? |
While I don't have better ideas myself, I'm still not sure it's the best solution. The first reason is the sTPS benchmark. Having 40% more reference time per transaction means we pack 40% fewer transactions into a block and thus we have 40% less TPS. Technically it doesn't make much difference, but from the marketing perspective 40% fall of TPS value of the whole network out of the blue is something that can make a lot of people unhappy. The second reason is our attitude towards the end users, namely, validator operators. Like, we guarantee that someone having the given hardware spec can run a validator, and that's why we measure reference times with reference hardware. Increasing reference times by 40% means we either were undershooting before, which is terrible, or are overshooting now, which is not great as well. Our end-user-facing metrics should be as consistent as possible. Last but not least, I'm just curious what the hell is going on there ;) I understand that different VMs having the same amount of resources will behave differently from the performance perspective, but a 40% difference is well over the threshold of something that can be explained solely by VM architecture differences, given that they both have very similar machine benchmarks. That said, I feel like more justification is needed to re-run benchmarks and increase all the weights. CC @eskimor |
What runtime are you using for sTPS testing @s0me0ne-unkn0wn? The Polkadot and Kusama runtimes have their own weights in the runtimes repo. Any weights we do here are for testnets only. Nobody should use them in prod, but generate their own anyway. But I think if this Docker-in-docker-in-kubernets is 40% slower than an actual server, then we should not use it. Nobody would reasonably run a validator like this either, so its not really "reference hardware" anymore. Seems like the machine benchmarks dont really show this. |
We've always used Rococo runtime for that and consciously used the Substrate weights without re-weighting. Substrate weights are the reference. I mean, if your runtime has Besides that, while relay chain runtimes, indeed, have their own weights, parachains often do not. And even if they had, it wouldn't help that much in measuring parachain throughput to have a hundred different weights for different parachains. So, for parachain throughput measurements, we also use Substrate weights as a reference.
Exactly! That's the point. Reference should be, well, a reference, that is, something we don't change easily. |
If you look only at the balances pallet, then I guess, yes. Generally though, no. The weight of each pallet depends on the runtime config. The callbacks that are configured and the general pallet config can cause hugely different weights for production cases. The weights in the runtimes repo are more reliable, if you want to do some testing. Either way, for getting the maximal number of TPS, the weights are not really important. You should just increase the block weight limit until the relay block time increases or the PVF times out. |
I understand that, but we don't want maximum numbers but fair numbers. Like, if someone asks, "If I start a parachain on Polkadot, what TPS will I get?" and we answer, "You'll get 5550 balance transfers per block given that you're not using weird data types in your setup", that's fair. Yes, we could maximize even that (like, parameterize the balance transfer benchmark and account for the shortest execution path where we transfer from a pre-funded to a pre-existing account and thus get a lot more TPS in the sTPS test), but that wouldn't be that fair anymore. So, I believe we're just trying to get the right balance point between being precise and being fair. |
Closes: #5545
git pull
-> https://github.com/paritytech/polkadot-sdk/actions/runs/10644887539/job/29510118915Tip: review this one with Whitespace hidden