-
Notifications
You must be signed in to change notification settings - Fork 20.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core/tracing: state journal wrapper #30441
base: master
Are you sure you want to change the base?
Conversation
core/tracing/hooks.go
Outdated
NonceReadHook = func(addr common.Address, nonce uint64) | ||
|
||
// CodeReadHook is called when EVM reads the code of an account. | ||
CodeReadHook = func(addr common.Address, code []byte) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open question: should we add codeHash here to be consistent with OnCodeChange
?
Ah seems like the journal has a crasher:
|
core/tracing/CHANGELOG.md
Outdated
|
||
### New methods | ||
|
||
- `OnReorg(reverted []*types.Block)`: This hook is called when a reorg is detected. The `reverted` slice contains the blocks that are no longer part of the canonical chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here types block is very very heavy. You should at most pass headers and allow chain access to pull the blocks on demand (chain access in someconstructor, ha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought what is the issue? it is a slice so passed by reference and the memory can be freed as soon as OnReorg
processing is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, this is annoying. So reorg in the blockchain at some point in the past used to collect blocks. Turned out that sometimes it became insanely heavy and we've switched so it operates on headers. I guess later someone refactored it back to operate on blocks again. This is an issue when you do setHead or any similar operation; of even if finality fails for a while and you have blocks reorging back and forth. It's very very bad to pull all the block in from disk IMO.
CC @holiman @rjl493456442 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I don't particularly recall switching from headers to blocks....
core/tracing/hooks.go
Outdated
@@ -194,6 +221,30 @@ type Hooks struct { | |||
OnCodeChange CodeChangeHook | |||
OnStorageChange StorageChangeHook | |||
OnLog LogHook | |||
// State reads | |||
OnBalanceRead BalanceReadHook | |||
OnNonceRead NonceReadHook |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question from triage: how exactly is OnNonceRead used?
I have pulled in the latest master changes and moved the state read hooks over to the hooked statedb. One thing to know about the state read hooks: They will not give you the full prestate by themselves. You will need also the previous values emitted as part of state change hooks. This is because e.g. statedb.AddBalance does not do statedb.GetBalance internally. |
Copying from the chat with @nebojsa94:
|
core/tracing/hooks.go
Outdated
} | ||
|
||
// Copy creates a new Hooks instance with all implemented hooks copied from the original. | ||
func (h *Hooks) Copy() *Hooks { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method needed? It's not obvious to me what the side-effects are. Typically, the hooks might be closures, and the closures are still referenced as if they were not copied.
func TestHooks(t *testing.T) {
counter := 0
a := &Hooks{
OnClose: func() {
counter++
},
}
a.OnClose()
t.Logf("counter is %d", counter)
a.Copy().OnClose()
t.Logf("counter is %d", counter)
}
outputs:
hooks_test.go:13: counter is 1
hooks_test.go:15: counter is 2
I'm curious why you'd ever need to use this Copy
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because you want to copy all, but not have to specify all manually?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, the hooks might be closures, and the closures are still referenced as if they were not copied.
Right you are correct I had not foreseen that. After thinking a bit, it feels like for my use-case that is fine. Essentially the clone will replace some of the methods to add some pre-processing logic. The rest are supposed to execute as in the original tracer.
I have un-exported the Copy method to avoid people to shoot themselves in the foot and added a comment to clarify this point.
Edit: right what I want is to add the journal in front of the tracer and process some of the hooks first before proxying back to tracer. And this without having to iterate the list of all hooks which I find very error-prone. I have already had to fix bugs because of missing some hook in there.
@@ -172,6 +173,9 @@ type ( | |||
|
|||
// LogHook is called when a log is emitted. | |||
LogHook = func(log *types.Log) | |||
|
|||
// BlockHashReadHook is called when EVM reads the blockhash of a block. | |||
BlockHashReadHook = func(blockNumber uint64, hash common.Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use-case is to have access to the headers of hashes that are accessed by the EVM. Alternative would be if we added a GetHeaderByHash method somewhere. But getting the hash from OnOpcode is also tricky since the hash will be put on the stack after OnOpcode is invoked.
type journal struct { | ||
entries []entry | ||
hooks *Hooks | ||
lastCreator *common.Address // Account that initiated the last contract creation | ||
|
||
validRevisions []revision | ||
nextRevisionId int | ||
revIds []int | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the linearJournal
in my PR #30660 is IMO a better base to start from. It does away with validrevisions and revIds, instead it just as a list of indexes, revisions
, which point to an entry.
type linearJournal struct {
entries []journalEntry // Current changes tracked by the linearJournal
dirties map[common.Address]int // Dirty accounts and the number of changes
revisions []int // sequence of indexes to points in time designating snapshots
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked at #30660 and agree it is a better way to do journaling for tracers. The key point for me there is that there will be only 1 revert hook emitted as opposed to one for each change to a state element.
Given that #30660 seems to be still in flux I like to wait on it to be merged and implement it for tracers in a future PR as an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These points were discussed at standup:
- It looks like we will need to change the model a bit with the set-based journal as it operates on accounts, requiring also a new hook like
OnAccountReverted
and the inconsistencies there around emitting state changes on the field level and the reverts being on the account level. - The point was raised by @holiman that we are exposing a behaviour to tracers that will be hard to revert. The behaviour in question is the reverse of every state change (i.e. if Balance: A->B->C, we emit Balance: C->B->A on revert instead of just Balance: C->A).
On the second point I'd like to add my perspective: This journal is simply a wrapper around the tracers. We are not changing tracing interface semantics at all. Users can copy this file and run it themselves right now. And we are exposing every state modification, and I believe the reverse of it is the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we will need to change the model a bit with the set-based journal
I don't get it. The two PRs, the two journals are unrelated. You have opted to copy-paste the legacy linear journal-implementation. I think the new linear journal-implementation is better/simpler.
You could also have chosen to copy-paste the set-based journal-implementation. I don't care which you choose, really, but I don't see any point in picking one now and switching later. If you want another, pick that one from the get-go ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can copy this file and run it themselves right now.
They can't perform WrapWithJournal
from "user-space," can they? Isn't that what makes this a big "blessed" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can't perform WrapWithJournal from "user-space," can they? Isn't that what makes this a big "blessed" ?
Right exact copy wouldn't work. They'd have to implement WrapWithJournal locally, but it's totally possible.
I don't care which you choose, really, but I don't see any point in picking one now and switching later.
I think the current journal is good, it is consistent with the existing interface, and it has been running in geth for quite a while.
validRevisions []revision | ||
nextRevisionId int | ||
revIds []int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why you need to track this. Why not just maintain a list of entries, and you hand out the id
which is the current length of the entries?
|
||
func (j *journal) OnNonceChange(addr common.Address, prev, new uint64) { | ||
// When a contract is created, the nonce of the creator is incremented. | ||
// This change is not reverted when the creation fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, what? Doesn't that depend on ... things.. ? Or is it always the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For evm-creates (as opposed to create-tx where the flow is different), the new scope begins here: https://github.com/ethereum/go-ethereum/blob/master/core/vm/evm.go#L425 , and the nonce-bump happens a few lines later: https://github.com/ethereum/go-ethereum/blob/master/core/vm/evm.go#L442
However, the actual snapshot is taken even further below: https://github.com/ethereum/go-ethereum/blob/master/core/vm/evm.go#L479 .
So that's the mismatch you see. What the tracing percieves is
SCOPE_START{
NONCE ++
OTHER_STUFF
}
But the reality is
NONCE++
SCOPE_START{
OTHER_STUFF
}
So when the scope reverts, you see this weird inconsistency that somehow nonce is a special snowflake.
The proper fix would be to move the captureBegin
-invocation down, so it happens close to where we take the snapshot. The nonce-bump belongs to the parent scope.
type journal struct { | ||
entries []entry | ||
hooks *Hooks | ||
lastCreator *common.Address // Account that initiated the last contract creation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a hack. I don't see how this can be accurately updated going forward and backward along the entries. I mean, an inner scope will overwrite the outer lastCreator
, and when the inner scope is reverted, the lastCreator
will not be set back correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if we're inside a creation, and inside the constructor we call ripemd to calculate a signature: we lost lastCreator.
Implements #30356