You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes Tx simulation failures due to account sequence mismatch causes hermes to unnecessary delay the relaying.
Background:
hermes simulates every transaction by calling the simulateTx (application gRPC). If successfully simulated, it uses the gas to determine the Tx fee and then
it broadcast_tx_sync-s the transaction (tendermint RPC), i.e. sent to tendermint mempool, which calls the application checkTx.
In both cases the verification happens on a copy of the application state, aka checkTx state.
the checkTx state is also updated by the application after a block is being committed (deliverTx is run for all Tx-es in a block)
Here are the details of the issue:
hermes simulates and broadcast_tx_sync-s N transactions with following sequence numbers: [1,.., K, K+1, ...,N]
check Tx state last account sequence is N.
tendermint creates a block with [1,..,K]. This involves calling Commit() which:
updates the application by calling deliverTx for [1,..,K] and then recreating the check Tx state with latest application state:
hermes tries to simulate a transaction with sequence number N+1 sometime before recheckTxs() above finishes. It gets the error:
"account sequence mismatch, expected K+x, got N+1: incorrect account sequence", with x being the number of Tx-es that have been rechecked
hermes recaches to K+x and tries to simulate again a bit later only to get: "account sequence mismatch, expected N+1, got K+x: incorrect account sequence"
then recaches again to N+1 and the simulation is usually successful as the recheckTxs() above is finished already.
I changed the code locally to use default gas on simulation failure and things look much better. This is because the mempool lock is grabbed during Commit() so broadcast_tx_sync for N+1 in the middle of recheckTxs() will wait for checkTx on [K+1, .., N]
In summary the difference is:
simulateTx gRPC works on a checkTx state that can be concurrently updated by the recheckTxs()
broadcast_tx_sync RPC will be executed and the call to checkTx done only afterrecheckTxs() has finished.
Also checked the Go relayer and the simulation is retried a few times in the hope of catching and getting successful at the end of recheckTxs().
Version
all
Steps to Reproduce
create two chains (used gaia v7.0.0 that uses tendermint v0.34.14)
create channel
send 1000 ft-transfer-s
start hermes and check logs
Acceptance Criteria
Packet relaying should continue in this case, i.e. deal differently with the account mismatch error when got > expected. Specifically in this case, either use the default gas or retry the simulation (probably the latter). (TBD - asking SDK for a simulateTx flag that skips the account sequence check. But if this is made available it will probably be in later releases)
Note that for got < expected we still need to recache as this is an indication that some other entity has used the same account/ wallet.
For Admin Use
Not duplicate issue
Appropriate labels applied
Appropriate milestone (priority) applied
Appropriate contributors tagged
Contributor assigned/self-assigned
The text was updated successfully, but these errors were encountered:
Summary of Bug
Sometimes Tx simulation failures due to account sequence mismatch causes hermes to unnecessary delay the relaying.
Background:
simulateTx
(application gRPC). If successfully simulated, it uses the gas to determine the Tx fee and thenbroadcast_tx_sync
-s the transaction (tendermint RPC), i.e. sent to tendermint mempool, which calls the applicationcheckTx
.deliverTx
is run for all Tx-es in a block)Here are the details of the issue:
broadcast_tx_sync
-sN
transactions with following sequence numbers:[1,.., K, K+1, ...,N]
N
.[1,..,K]
. This involves callingCommit()
which:deliverTx
for[1,..,K]
and then recreating the check Tx state with latest application state:K
.recheckTxs()
which callscheckTx
for[K+1,..,N]
K+x
.N+1
sometime beforerecheckTxs()
above finishes. It gets the error:"account sequence mismatch, expected K+x, got N+1: incorrect account sequence"
, with x being the number of Tx-es that have been recheckedK+x
and tries to simulate again a bit later only to get:"account sequence mismatch, expected N+1, got K+x: incorrect account sequence"
N+1
and the simulation is usually successful as therecheckTxs()
above is finished already.I changed the code locally to use default gas on simulation failure and things look much better. This is because the mempool lock is grabbed during
Commit()
sobroadcast_tx_sync
forN+1
in the middle ofrecheckTxs()
will wait forcheckTx
on[K+1, .., N]
In summary the difference is:
simulateTx
gRPC works on a checkTx state that can be concurrently updated by therecheckTxs()
broadcast_tx_sync
RPC will be executed and the call tocheckTx
done only afterrecheckTxs()
has finished.Also checked the Go relayer and the simulation is retried a few times in the hope of catching and getting successful at the end of
recheckTxs()
.Version
all
Steps to Reproduce
v0.34.14
)Acceptance Criteria
Packet relaying should continue in this case, i.e. deal differently with the account mismatch error when
got > expected
. Specifically in this case, either use the default gas or retry the simulation (probably the latter).(TBD - asking SDK for a simulateTx flag that skips the account sequence check. But if this is made available it will probably be in later releases)
Note that for
got < expected
we still need to recache as this is an indication that some other entity has used the same account/ wallet.For Admin Use
The text was updated successfully, but these errors were encountered: