-
Notifications
You must be signed in to change notification settings - Fork 20k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eth/catalyst: move block commit into its own go-routine to avoid deadlock #29657
Conversation
Looking into this lockup a bit, I was wondering why the // If the beacon chain is ran by a simulator, then transaction insertion,
// block insertion and block production will happen without any timing
// delay between them. This will cause flaky simulator executions due to
// the transaction pool running its internal reset operation on a back-
// ground thread. To avoid the racey behavior - in simulator mode - the
// pool will be explicitly blocked on its reset before continuing to the
// block production below.
if simulatorMode {
if err := api.eth.TxPool().Sync(); err != nil {
log.Error("Failed to sync transaction pool", "err", err)
return valid(nil), engine.InvalidPayloadAttributes.With(err)
}
} Seems to me that adding the spinoff goroutines in this PR is basically a hack around this particular thing that is intentionally put there. I may be wrong, haven't fully grokked the entire situation yet. |
I'm not sure there's a good way around this. It is clear that we need the call to Afaict, the two options we have here are:
|
@jwasinger please look at this again |
Spent the day looking into this and trying to come up with some solution that doesn't involve manually calling txpool Also, I looked at trying to create some way to intercept the Right now, the solution in this PR is the best I can think of. I would like to improve it further by moving the |
…ck sealing loop, and out of fcu
7118ea0
to
a52979d
Compare
sub = a.sim.eth.TxPool().SubscribeTransactions(newTxs, true) | ||
newTxs = make(chan core.NewTxsEvent) | ||
sub = a.sim.eth.TxPool().SubscribeTransactions(newTxs, true) | ||
commitMu = sync.Mutex{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just fyi, the canonical way to initialize a variable with its zero value is just to declare it, i.e. you can write
var (
commitMu sync.Mutex
)
Initializing with a zero literal is bad style.
} | ||
go func() { | ||
commitMu.Lock() | ||
defer commitMu.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it matters, but this construction with the lock here will not preserve the receive order of select clauses. If both <-newTxs
and <-a.sim.withdrawals.pending
become ready to receive around the same time, they will both spawn new goroutines in quick succession. These goroutines will then begin to race for the lock, and it isn't guaranteed that the goroutine that got spawned first will also get the lock first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine because eventually each go-routine will unblock and commit, making sure every pending new txs/withdrawals notification will trigger a commit.
There is a problem with the logic (that is also present in master): We only call I looked into fixing this by altering the commit logic to commit in a loop, until the miner returns an empty payload. However, this is nontrivial because at the point where we are building a payload, we have already read withdrawals from their channel, and we have to somehow re-include them. Tbh, I'm not sure why the spamming test case included in this PR doesn't trigger this... Will try to brainstorm how to fix the issue more today. |
Yeah sounds like something that should be fixed. |
closing in favor of #30264 |
…30264) closes #29475, replaces #29657, #30104 Fixes two issues. First is a deadlock where the txpool attempts to reorg, but can't complete because there are no readers left for the new txs subscription. Second, resolves a problem with on demand mode where txs may be left pending when there are more pending txs than block space. Co-authored-by: Martin Holst Swende <martin@swende.se>
Closes #29475 . The provided test-case captures the issue and fails without the fix.