Skip to content
This repository has been archived by the owner on Apr 4, 2024. It is now read-only.

Problem: iterator on deeply nested cache contexts is extremely slow #617

Merged
merged 6 commits into from
Oct 5, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions x/evm/keeper/context_stack.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,27 @@ func (cs *ContextStack) Commit() {
cs.cachedContexts = []cachedContext{}
}

// CommitToRevision flatter multiple layers of cache contexts into one layer,
// to improve efficiency of db operations.
func (cs *ContextStack) CommitToRevision(target int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a unit test for this function

if target < 0 || target >= len(cs.cachedContexts) {
panic(fmt.Errorf("snapshot index %d out of bound [%d..%d)", target, 0, len(cs.cachedContexts)))
}

targetCtx := cs.cachedContexts[target].ctx
// commit in order from top to bottom
for i := len(cs.cachedContexts) - 1; i > target; i-- {
// keep all the cosmos events
targetCtx.EventManager().EmitEvents(cs.cachedContexts[i].ctx.EventManager().Events())
if cs.cachedContexts[i].commit == nil {
panic(fmt.Sprintf("commit function at index %d should not be nil", i))
} else {
cs.cachedContexts[i].commit()
}
}
cs.cachedContexts = cs.cachedContexts[0 : target+1]
}

// Snapshot pushes a new cached context to the stack,
// and returns the index of it.
func (cs *ContextStack) Snapshot() int {
Expand Down
3 changes: 3 additions & 0 deletions x/evm/keeper/state_transition.go
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,9 @@ func (k *Keeper) ApplyTransaction(tx *ethtypes.Transaction) (*types.MsgEthereumT
return nil, stacktrace.Propagate(err, "failed to apply ethereum core message")
}

// flatten the cache contexts to improve efficiency of following db operations
k.ctxStack.CommitToRevision(revision)
Comment on lines +201 to +202
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this relate to k.CommitCachedContexts()? why is this called here instead of inside the !res.Failed condition? I need more context to understand the rationale of this change

Copy link
Contributor

@thomas-nguy thomas-nguy Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more context, we discovered a bug in our testnet which make the network to halt.

Some uniswap call triggers a lot of internal call. And trying to replay the tx will end up in an infinite loop there
https://github.com/tharsis/ethermint/blob/main/x/evm/keeper/keeper.go#L181
https://github.com/cosmos/cosmos-sdk/blob/master/store/cachekv/mergeiterator.go#L206

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change here doesn't provide context on the following questions:

  • How does flattening improve efficiency? Do you have benchmark results?
  • Is it just to commit changes and avoid the infinite loop?
  • What's the threshold (i.e max number) of internal calls that the EVM supports without crashing?
  • Can we add a test to check the max number of internal calls supported?
  • Do we still need to call the CommitCachedContexts() if the changes are being committed regardless if it fails or not? Can this logic commit an invalid result or affect the revert logic as well?

Since the underlying issue is from the SDK, you should open an issue there to get it fixed in the long term

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add relevant comments and tests based on the questions above

Copy link
Contributor

@leejw51crypto leejw51crypto Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screenshot from 2021-10-03 19-34-25

Keeper, ApplyTransaction,
logs := k.GetTxLogsTransient(txHash)

iter := store.Iterator(txHash.Bytes(), end) <- exact point
defer iter.Close()

blocking point
it hangs forever here,

// Commit commits all the cached contexts from top to bottom in order and clears the stack by setting an empty slice of cache contexts.
func (cs *ContextStack) Commit() {
	// commit in order from top to bottom
	for i := len(cs.cachedContexts) - 1; i >= 0; i-- {
		// keep all the cosmos events
		cs.initialCtx.EventManager().EmitEvents(cs.cachedContexts[i].ctx.EventManager().Events())
		if cs.cachedContexts[i].commit == nil {
			panic(fmt.Sprintf("commit function at index %d should not be nil", i))
		} else {
			cs.cachedContexts[i].commit()
		}
	}
	cs.cachedContexts = []cachedContext{}
}

Commit is recursive function

Copy link
Contributor

@leejw51crypto leejw51crypto Oct 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in this pr, before calling Commit, calls cached contexts Commit first linearly.


k.IncreaseTxIndexTransient()

res.Hash = txHash.Hex()
Expand Down