-
Notifications
You must be signed in to change notification settings - Fork 20.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unmergeable: benchmarking #21207
unmergeable: benchmarking #21207
Conversation
Nice analysis! Do you want to create a PR against 2046 documenting this in an appendix and including the new proposed figure? It would be also important to reproduce these results in at least another client. |
That seems to mean that precompile CALL cost itself is almost zero and everything that takes time is popping items from the stack, right? |
I've briefly looked at the code of STATICCALL here and it does quite a few things indeed, but it should be ok to assume that in-memory DB is fast enough to simulate happy-path to quickly jump to the precompile. Can you dump the code w/o STATICCALL to hex, I'll try to poke Rust implementation? I wonder if EVMONE will be much faster, but this is beyond my expertise. Also for modified identity precompile |
Oh, it just hit my mind. Your code is actually 6 pushes + staticcall (or 6 pushes + 6 pops), so staticcall cost that you have proposed should be divided by two, no? |
I don't understand your reasoning there... care to elaborate? |
Also, I've found what appears to be an error in go. I'll report it upstream and see what they say (about those allocs) |
No, I think you misunderstand what the analysis shows. I can elaborate further some other time, I've been digging into golang/go#39514 the entire evening. However, If we can somehow work around whatever causes that to happen, I'm pretty sure we'll see the number go down from
Thanks!
A bit too early, IMO.
Yes, that would be very interesting! That's why I cc:d so many on ACD. It hit me now that maybe I missed cc:ing the parity folks -- cc @vorot93 @sorpaas |
Oh, what I have meant is that each simple piece of code crunches through the 100M gas over N loop iterations, but those iterations include not only staticcall, but an extra code around it. The conclusion looks to be is that staticcall is equal to 6 POPs in Geth, that should be 12 gas. |
Ah, no, ignore the N iterations. That's just the bench framework way to wind up with a stable timing. We simply wants both loops to burn through 100M gas in roughly the same time, regardless of what ops are executed. The calling loop will execute fewer loops than the popping loop, but that's fine.
|
If this benchmarks results
represents that a function with STATICCALL to precompile that does nothing (modified identity) and a function than does just 6 POPs instead take roughly the same time to execute then STATICCALL = 6 POPs = 12 gas? |
Not quite. The benchmark result represent:
Now, in the benchmark your quoted, the cost for What's important is not any absolute values on gas per second on a particular hardware. That's totally irrelevant. The only thing that is important is that operations are balanced. If one operation is unbalaned, then a block filled with that particular operation would cause slow blocks, which could be a DoS vulnerability. That's exactly what happened during the shanghai attacks. |
name old time/op new time/op delta SimpleLoop/identity-precompile-10M-6 117ms ±75% 43ms ± 1% -63.09% (p=0.008 n=5+5) SimpleLoop/loop-10M-6 79.6ms ± 4% 70.5ms ± 1% -11.42% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SimpleLoop/identity-precompile-10M-6 24.4MB ± 0% 4.9MB ± 0% -79.94% (p=0.008 n=5+5) SimpleLoop/loop-10M-6 13.2kB ± 0% 13.2kB ± 0% ~ (p=0.357 n=5+5) name old allocs/op new allocs/op delta SimpleLoop/identity-precompile-10M-6 382k ± 0% 153k ± 0% -59.99% (p=0.000 n=5+4) SimpleLoop/loop-10M-6 40.0 ± 0% 40.0 ± 0% ~ (all equal)
name old time/op new time/op delta SimpleLoop/identity-precompile-10M-6 43.3ms ± 1% 42.4ms ± 7% ~ (p=0.151 n=5+5) SimpleLoop/loop-10M-6 70.5ms ± 1% 76.7ms ± 1% +8.67% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SimpleLoop/identity-precompile-10M-6 4.90MB ± 0% 2.46MB ± 0% -49.83% (p=0.008 n=5+5) SimpleLoop/loop-10M-6 13.2kB ± 0% 13.2kB ± 1% ~ (p=0.571 n=5+5) name old allocs/op new allocs/op delta SimpleLoop/identity-precompile-10M-6 153k ± 0% 76k ± 0% -49.98% (p=0.029 n=4+4) SimpleLoop/loop-10M-6 40.0 ± 0% 40.0 ± 0% ~ (all equal)
name old time/op new time/op delta SimpleLoop/identity-precompile-10M-6 42.4ms ± 7% 37.5ms ± 6% -11.68% (p=0.008 n=5+5) SimpleLoop/loop-10M-6 76.7ms ± 1% 69.1ms ± 1% -9.82% (p=0.008 n=5+5) name old alloc/op new alloc/op delta SimpleLoop/identity-precompile-10M-6 2.46MB ± 0% 0.02MB ± 0% -99.35% (p=0.008 n=5+5) SimpleLoop/loop-10M-6 13.2kB ± 1% 13.2kB ± 0% ~ (p=0.143 n=5+5) name old allocs/op new allocs/op delta SimpleLoop/identity-precompile-10M-6 76.4k ± 0% 0.1k ± 0% ~ (p=0.079 n=4+5) SimpleLoop/loop-10M-6 40.0 ± 0% 40.0 ± 0% ~ (all equal)
I have pushed some changes now, which work their way around the alloc problems. With those in place, it appears that
|
Note, though: that the pushed changes are not complete. They need to be applied for the other call variants too, so at this point they're not quite ready to just merge to master |
to = AccountRef(addr) | ||
snapshot = evm.StateDB.Snapshot() | ||
) | ||
snapshot := evm.StateDB.Snapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a snapshot is not used in RunPrecompiledContract
below does it affect the performance?
Yes, a little bit, bit it's pretty cheap
|
Didn't dig deeper, but for similar tests in OpenEthereum
Now I need to understand how to interpret them, but roughly it looks like it may be below 40 gas for OpenEthereum. Important: identity precompile is not modified! |
Cool! Could you link to your code?If you haven't modified identity, I think your measuring a cost of 56 gas instead of 40. So give 26 a spin too.
|
I mean 24
|
Also, might want to give it a warm-up run before benchmarking. To load state cache and instantiate everything. My benchmark is the pure runloop without the extra time for initializing state etc.
|
The code is here. I didn't test other values cause you can optimistically get 40 gas per STATICCALL already implemented if you use "berlin" schedule. The way how benchmarking is written VM is made before the measurement, and benchmarking framework takes care of averaging and filtering outliers. |
@holiman I think that @shamatar was right here I think the better approach would be to set a counter to X and loop that many times and then compare the timings - this would give you the actual gas cost of the static call you would need to set the gast cost of the static call to the number to make these two benchmarks equal and then you would have to subtract 12 (cost of 6 pops) from this value |
by the way - the number that I am getting is something like 350 gas for static call now but I need to dig a bit deeper |
looking at this allocs on our side to bring it down - it seems that our code invokes empty account zero transfer touch logic on precompile which slows it down (and a few other things) |
I actually did every benchmark with both 100M and 10M, to verify that the results are correlated. I can run with 1Bn gas too, I don't expect any difference. I don't understand what's wrong with my reasoning, could you elaborate what you think is off?
|
Do avoid any doubt about what the benchmark measures can we just place a code that runs through the loop for 1M times, and measure how long it would take to run? Irrespective of the gas consumed. Then we can place POPs to make running time equal and price STATICCALL this way. |
The benchmarks as it’s written right now will either stop if it does N iterations with staticcall or M iterations with POPs. If prices are the proper then N * cost = M * cost_1 = time * GAS/second, so invariants should be preserved and you can calculate costs if you will read N or M somehow. |
I don't think that's a good approach - then you'll price staticcall using pop (and push or dup) as the golden standard, and only those. My approach smears it out a bit, so we compare against the set of all instructions in the loop, some pushes, dup's, a jump + jumpdest.
|
Dummy cycle can be make out from many opcodes, but if most popular opcodes (as it seems to me) in EVM are not well priced then it’s kind of a separate large problem |
Here's with
|
Inversion between 10B and 100M is interesting |
so in Nethermind we are at around 150 gas with the current code we can go to 41 gas with some significant changes (chosing a different path for precompile calls) @holiman are you reporting the account touch event in your benchmark? |
also, I think we picked a test where no memory operations are executed - because of the loop with STATICCALL passing 0 length memory access and zero length result write |
How does EVM handle memory extensions and on what point? |
memory extension is not that important (it is priced separately) |
Every single opcode which handles memory needs to check if a memory offset + length exceeds the current memory size, if it does, expansion needs to be charged for. |
yeah, I think the inversion of results between 10B and 100M may be due to the problem that we are trying to explain |
In benchmarks what I’ve made in a repo there is one extra memory copy at the end. |
one can fill with PUSH - POP calls - 6 of them is 30 GAS on top of 12 from 6xPOP |
what meaning of balance breakdown, and how to transfer crypto from github
acc and why my balance not appear at github acc.
Regards
…On Fri, Jun 12, 2020, 2:50 PM Martin Holst Swende ***@***.***> wrote:
Cool! Could you link to your code?If you haven't modified identity, I
think your measuring a cost of 56 gas instead of 40. So give 26 a spin too.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#21207 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AP5V3JUCGIIRDEBRTQ35NPLRWHF4HANCNFSM4N2K5Y6A>
.
|
Well, we first have to decide if current pricing of POPs and PUSHes is proper (it well reflects time spent to execute them). If it is then we should definitely set a loop size instead of gas limit and look at the execution time. If not (POP and PUSH prices are off and no one cares) then Martin's approach to benchmarking is the only way, but anyway have a bigger problem :) |
not really - even with mispriced PUSH and POP you get at least the same number of loops when you add them to balance the chosen cost of STATICCALL |
If their price does not reflect the time we can not make a conclusion about “equal running time for N loop iteration => equal pricing” |
This is not actually a PR, but an analysis of the cost of calling precompiles, due to EIP-2046 to lower the cost of calling precompiles.
It's based on benchmarking the following two loops, which both loops until
OOG
:prog1
Code which calls the
identity
precompile with zero arguments:prog2
And code which doesn't do the call, but instead just pops the arguments off the stack:
Current
With the current cost of
700
, theprog1
is obviously a lot faster, burning through100M
gas in120ms
as opposed to730ms
:With the proposed
40
gas:It's actually a factor
2
too slow. This means that we lowered the gas too much!Testing with
100
gives a pretty close match:This gives a pretty good match:
Note though, that this includes a cost of
15 + 1
for the call to identity -- the cost of the input. So, let's trywith that cost set to zero, and the precompile doing nothing.
This makes the match even better:
So for go-ethereum, the most 'correct' value, with the current implementation, seems to be a cost of
100
for calling a precmpile.