Skip to content

Conversation

mattac21
Copy link
Contributor

@mattac21 mattac21 commented Sep 25, 2025

Description

When load testing the EVM mempool with BlockSTM, I encountered the following panic (on the main branch):

GetBlock called for invalid block number - this indicates a reorg attempt block_number=1165 module=Blockchain
panic: GetBlock should never be called on a Cosmos chain due to instant finality - this indicates a reorg is being attempted
goroutine 459284 [running]:
github.com/cosmos/evm/mempool.(*Blockchain).GetBlock(0xc004108488, {0x1f, 0xa6, 0xe4, 0x89, 0x81, 0x52, 0x12, 0x8e, 0x83, ...}, ...)
	github.com/cosmos/evm@v0.2.0/mempool/blockchain.go:162 +0x365
github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).reset(0xc001e1a1e0, 0xc0017ff408, 0xc0616b8288)
	github.com/cosmos/evm@v0.2.0/mempool/txpool/legacypool/reset_production.go:28 +0x225
github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).runReorg(0xc001e1a1e0, 0xc002b0a1c0, 0xc03e190450, 0x0, 0xc02f9961e0)
	github.com/cosmos/evm@v0.2.0/mempool/txpool/legacypool/legacypool.go:1285 +0x185
created by github.com/cosmos/evm/mempool/txpool/legacypool.(*LegacyPool).scheduleReorgLoop in goroutine 188
	github.com/cosmos/evm@v0.2.0/mempool/txpool/legacypool/legacypool.go:1214 +0x18a

Upon further investigation, legacypool.Reset will panic if it is called with newHead=oldHead+2. Based on comments, reset thinks that this is a reorg, making the assumption that reset be never be called with a skipped header in between oldHead and newHead. However this is possible for Cosmos chains as demonstrated by the test that reproduces the behavior.

In the test we use the legacypool.BroadcastFn to simulate slow processing during runReorg, however it could be anything slows that function down. If at the same time during this processing, multiple new headers are pushed to newHeadCh and processed in the txpool.loop(), then the next time legacypool.Reset is called, it will be called where newHead.ParentHash != oldhead.Hash, causing a panic since GetBlock will be called on a Cosmos chain, which is not expected.

Since this function would only panic on this condition (and we are already replacing the function during tests), I simply removed the contents of the inner if statement that would panic if we encountered it with this state. From my understanding this is a valid state and there is no need to panic here.

Closes: #XXXX


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • tackled an existing issue or discussed with a team member
  • left instructions on how to review the changes
  • targeted the main branch

@mattac21 mattac21 changed the title Fix: Subpool Reset Panic fix: Subpool Reset Panic Sep 25, 2025
@mattac21 mattac21 changed the title fix: Subpool Reset Panic fix(mempool): legacy subpool panic on skipped header during reset Sep 26, 2025
@mattac21 mattac21 marked this pull request as ready for review September 26, 2025 16:38
reinject = lost
}
}
// this is a strange reorg check from geth, it is possible for cosmos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you ref this PR in this comment so we can easily link back to it if we need to?

Copy link
Contributor

@aljo242 aljo242 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with small nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants