-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Nethermind error messages #7999
Conversation
I see that you haven't updated any CHANGELOG files. Would it make sense to do so? |
79aba0f
to
902c7e9
Compare
902c7e9
to
b59bc07
Compare
core/chains/evm/client/errors.go
Outdated
@@ -167,13 +167,16 @@ var klaytn = ClientErrors{ | |||
// Nethermind | |||
// All errors: https://github.com/NethermindEth/nethermind/blob/master/src/Nethermind/Nethermind.TxPool/AcceptTxResult.cs | |||
// All filters: https://github.com/NethermindEth/nethermind/tree/9b68ec048c65f4b44fb863164c0dec3f7780d820/src/Nethermind/Nethermind.TxPool/Filters | |||
var nethermindFatal = regexp.MustCompile(`(: |^)(SenderIsContract|Invalid|Int256Overflow|FailedToResolveSender|GasLimitExceeded)$`) | |||
var nethermindFatal = regexp.MustCompile(`(: |^)(SenderIsContract|Invalid(, transaction Hash is null)?|Int256Overflow|FailedToResolveSender|GasLimitExceeded(, Gas limit: \d+, gas limit of rejected tx: \d+)?|NonceGap(, Future nonce. Expected nonce: \d+)?)$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NonceGap seems very interesting!
I haven't seen such an error being returned before.
I think we need special handling for this. If we treat this as fatal, bad things can happen in our txm.
For example, we send multiple Txs for the same FromAddresses one after another, so lets say we send the following:
Tx1: nonce 1
Tx2: nonce 2
It is possible that a node sees Tx2 before Tx1. In that case, this is not a fatal error, and we should just retry.
For now, I think we should treat NonceGap as a retryable error.
But this is a very critical issue we need to handle here, or else can cause major problems in some nodes.
@samsondav @jmank88 for a second look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, other eth clients, when they see a nonce gap, just accept the Tx, and keep it in their mempool, and wait till they see the missing nonce too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah I guess we normally rely on the queue/pending node and RPC config to be aligned so this is avoided all together... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean you could configure an RPC endpoint to not return such NonceGap errors?
In any case, we cannot always control RPCs, so should also handle it in code correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that when properly configured, we should never get so far ahead of ourselves that this happens, but yes it would be better to support more cases.
Our docs describe how to align ETH_MAX_QUEUED_TRANSACTIONS
/ETH_MAX_IN_FLIGHT_TRANSACTIONS
with the RPC node's AccountSlots
/GlobalSlots
/AccountQueue
/GlobaQueue
:
https://docs.chain.link/chainlink-nodes/evm-performance-configuration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the misconfiguration case still possible too? What error is returned for the first tx after the pool is full?
I am thinking now that some of the edge cases we are worried about here would likely only occur on batch (re)broadcasts, since we send to active primaries serially and should halt as soon as the pool is full... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah so I think that fatal is probably correct given that this is only applied to the code doing the serial send to the active primary. IIUC the non-fatal, out-of-order case can only happen on other nodes, where we don't react to errors the same way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should not be possible for Chainlink to send txes with a nonce gap. I can't say for certain though that there isn't some combination of re-orgs, resending etc where it might not be possible. If it does happen, we woul dexpect to be able to send the higher nonce then "fill in" the gap with lower nonces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example here:
This orders by eth_tx.id, then nonce. Is it possible you could send nonces out of order? I think so. We would need to review all such cases (perhaps this is the only one?).
It should be safe to reverse those ordering terms btw. Might be worth including that along with this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's a bug in Nethermind. go-ethereum supports out-of-order transmissions
core/chains/evm/client/errors.go
Outdated
@@ -167,13 +167,16 @@ var klaytn = ClientErrors{ | |||
// Nethermind | |||
// All errors: https://github.com/NethermindEth/nethermind/blob/master/src/Nethermind/Nethermind.TxPool/AcceptTxResult.cs | |||
// All filters: https://github.com/NethermindEth/nethermind/tree/9b68ec048c65f4b44fb863164c0dec3f7780d820/src/Nethermind/Nethermind.TxPool/Filters | |||
var nethermindFatal = regexp.MustCompile(`(: |^)(SenderIsContract|Invalid|Int256Overflow|FailedToResolveSender|GasLimitExceeded)$`) | |||
var nethermindFatal = regexp.MustCompile(`(: |^)(SenderIsContract|Invalid(, transaction Hash is null)?|Int256Overflow|FailedToResolveSender|GasLimitExceeded(, Gas limit: \d+, gas limit of rejected tx: \d+)?|NonceGap(, Future nonce. Expected nonce: \d+)?)$`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of: SenderIsContract|Invalid(, transaction Hash is null)?
Does it make sense to just have: SenderIsContract|Invalid?
By trying to match the bracket part, are we getting anything extra?
Happy to have you hear from others too on their opinion.
Same comment on all cases below which have the bracketed() part added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have hesitated to make the matches too open ended in the past, specifically prefixes that match any suffix. So we usually enumerate the possibilities and use the trailing $
, like here. That internal one looks a little more complicated though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, if you look at the Nethermind PR, and search for this string "transaction Hash is null", (this file: src/Nethermind/Nethermind.TxPool/Filters/NullHashTxFilter.cs), it is clear that the way they return all fatal errors, is to use the prefix, and add (, reason text xyz).
So seems redundant and confusing to add the specific message in our matching string.
Note that even with this internal string, we still are keeping that open ended match. So adding this detailed string doesn't give us any added protection against side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not redundant. It is a deliberate design choice to match a limited set instead of an open ended one. I can't say whether it is the correct/best/optimal choice, but it is significant and shouldn't be dismissed hastily.
Note that even with this internal string, we still are keeping that open ended match. So adding this detailed string doesn't give us any added protection against side effects.
I don't know what you mean. The internal case of Invalid(, transaction Hash is null)?
is actually not as interesting as I first thought - just another instance of being explicit about the match. We only accept the two literals Invalid
or Invalid, transaction Hash is null
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a conscious choice, for the reasons I listed here:
https://chainlink-core.slack.com/archives/C0117GGJB6Y/p1669215075401729?thread_ts=1669126977.871589&cid=C0117GGJB6Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specially the NonceGap issue needs further discussion, since this. is not an error type which any eth client previously returned. I think right behavior is to retry in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments
b59bc07
to
2ce6edd
Compare
TerminallyUnderpriced: regexp.MustCompile(`(: |^)(FeeTooLow|FeeTooLowToCompete)$`), | ||
TerminallyUnderpriced: regexp.MustCompile(`(: |^)(FeeTooLow(, MaxFeePerGas too low. MaxFeePerGas: \d+, BaseFee: \d+, MaxPriorityFeePerGas:\d+, Block number: \d+|` + | ||
`, EffectivePriorityFeePerGas too low \d+ < \d+, BaseFee: \d+|` + | ||
`, FeePerGas needs to be higher than \d+ to be added to the TxPool. Affordable FeePerGas of rejected tx: \d+.)?|` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is how we are handling all errors, but I just don't like this level of hard matching. Any small string changes to this detailed error message on nethermind will again break us.
Perhaps sometime later we can think of a refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is making it looser can accidentally match stuff it isn't supposed to.
e.g. "known transaction" also matches "unknown transaction". This has bitten us in the past hence the extreme conservatism now.
@@ -36,6 +36,9 @@ func (s *SendError) CauseStr() string { | |||
|
|||
const ( | |||
NonceTooLow = iota | |||
// Nethermind specific error. Nethermind throws a NonceGap error when the tx nonce is greater than current_nonce + tx_count_in_mempool, instead of keeping the tx in mempool. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually seems like we already have a bunch of places in this file where "nonce too high" was treated previously as a fatal error.
Just search for the string "nonce too high" in this file.
I see gethFatal, arbitrumFatal, and erigonFatal being added in the last 6 months in different PRs.
Maybe not in this PR, but let us fix those in a follow-up PR.
https://app.shortcut.com/chainlinklabs/story/59049/nonce-too-high-this-error-is-incorrectly-being-treated-as-fatal-in-evm-code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But first we would have to verify that the other clients work the same way as Nethermind does. IIRC Geth doesn't use the nonce gap mechanism Nethermind does, so it might mean something else in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. @simsonraj is working on the other chains/clients as part of a separate ticket.
2ce6edd
to
284680b
Compare
c718613
to
f7593f9
Compare
f7593f9
to
5b40be4
Compare
// This can happen if previous transactions haven't reached the client yet. | ||
// The correct thing to do is assume success for now and let the eth_confirmer retry until | ||
// the nonce gap gets filled by the previous transactions. | ||
lgr.Criticalw("Transaction has a nonce gap.", "err", sendError.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The log level should be Warn here, and not critical, since this error could just show up due to network latencies.
@@ -556,6 +556,16 @@ func (eb *EthBroadcaster) handleInProgressEthTx(ctx context.Context, etx EthTx, | |||
return errors.Wrap(sendError, "this error type only handled for L2s"), false | |||
} | |||
|
|||
if sendError.IsNonceTooHighError() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add a unit-test for this case, as part of the test "TestEthBroadcaster_ProcessUnstartedEthTxs_Errors"?
I see there are already other cases which have tests in that function.
5b40be4
to
cd91bf6
Compare
cd91bf6
to
89e63fb
Compare
TerminallyUnderpriced: regexp.MustCompile(`(: |^)(FeeTooLow|FeeTooLowToCompete)$`), | ||
TerminallyUnderpriced: regexp.MustCompile(`(: |^)(FeeTooLow(, MaxFeePerGas too low. MaxFeePerGas: \d+, BaseFee: \d+, MaxPriorityFeePerGas:\d+, Block number: \d+|` + | ||
`, EffectivePriorityFeePerGas too low \d+ < \d+, BaseFee: \d+|` + | ||
`, FeePerGas needs to be higher than \d+ to be added to the TxPool. Affordable FeePerGas of rejected tx: \d+.)?|` + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is making it looser can accidentally match stuff it isn't supposed to.
e.g. "known transaction" also matches "unknown transaction". This has bitten us in the past hence the extreme conservatism now.
On this PR(NethermindEth/nethermind#3626) Nethermind extended its error messages. This PR adds the new regexes and test cases.