Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maintain local copy of the nonce for each used address #538

Merged
merged 3 commits into from
Feb 7, 2018

Conversation

dshulyak
Copy link
Contributor

@dshulyak dshulyak commented Jan 6, 2018

If multiple concurrent transactions are sent from the same address
nonce for some of them will be the same. It will result in the
"known transaction error" and such transaction will be dropped
from a queue.

There are 2 alternatives to solving such problem:

  • adding an auto-retry to transaction queue, which is very complex
    and error-prone, this way we can actually retry transactions
    that are already processed by the Ethereum
  • maintain nonce locally

The second approach is a straightforward. We keep asking for a nonce
from upstream, but if our local nonce is higher we will stick to it.
Our local nonce is updated only if transaction succeeds, so there is
no way to send out of order transaction.

Blocked by: #537

@divan
Copy link
Contributor

divan commented Jan 9, 2018

@dshulyak thanks! Two questions:

  1. What's the core reason for this nonce collision? GetTransactionCount implementation is concurrency unfriendly, and I wonder what's the reason for that. Maybe a proper fix will be on the go-ethereum side?

  2. RunUpdate() function looks overcomplicated and less readable than straightforward approach with safe local nonce retrieve/increment. Or am I missing something and you see another RunUpdate() use cases?

@dshulyak
Copy link
Contributor Author

dshulyak commented Jan 9, 2018

@divan GetTransactionCount doesn't include transaction queue when it counts current nonce for an account. So, if we will request GetTransactionCount when the queue has at least one tx from the same account - we will get multiple tx's with the same nonce.

Yes, there is ethereum/go-ethereum#15794 and turns out there is a similar method in parity client https://github.com/paritytech/parity/wiki/JSONRPC-parity-module#parity_nextnonce

So, maybe it is not critical and can wait for a next release of geth

@holiman
Copy link

holiman commented Jan 23, 2018

If multiple concurrent transactions are sent from the same address
nonce for some of them will be the same. It will result in the
"known transaction error" and such transaction will be dropped
from a queue.

That's not how it works at all. Multiple concurrent transactions from one address gets different (incremental) nonces, if you submit via eth/personal sendTransaction.

If you are you talking about the case where you use getTransactionCount and thus supply the nonce "yourself" by setting it on the caller-side? I don't see a compelling reason to change that usecase, since geth can do much better guarantees about concurrency when being in charge of both setting the nonce and submitting the transaction (as it does in sendTransaction).

Edit: Sorry, I slipped in here from ethereum/go-ethereum#15794, and didn't even notice it was a PR on a separate repo :)

@dshulyak
Copy link
Contributor Author

dshulyak commented Jan 23, 2018

@holiman thanks for a comment :) , this is exactly about the case when we are using getTransactionCount concurrently. we do this because we are signing tx locally and then sending it with eth_sendrawtransaction.

i think that for such use case the best solution would be to add jsonrpc call similar to -
https://github.com/paritytech/parity/wiki/JSONRPC-parity-module#parity_composetransaction

does it make sense to propose similar extension for geth jsonrpc?

GitHub
parity - Fast, light, robust Ethereum implementation.

@holiman
Copy link

holiman commented Jan 23, 2018

I don't really see how it could work. Because in the gap between composeTransaction and the actual submission of the transaction (which is non-neglible time, since it needs to be signed), another call to composeTransaction will use the same nonce.

And if you want to fix that, you'll have to add a second (third?) nonce-counter which is used to maintain a nonce-counter which somehow reserves and increments nonces used only for compose. It will get messy fast.

If you have this usecase, I'd think the best approach would be to maintain nonces purely on the client-side, using getTransactionCount initially to 'bootstrap' it at startup, then not use it again.

@dshulyak
Copy link
Contributor Author

dshulyak commented Jan 23, 2018

@holiman i forgot about another part of this problem. rpc call getTransactionCount doesn't include transaction queue when it gets a nonce. You can find more info about it here ethereum/go-ethereum#15794

It will work only if two conditions will be met:

  1. We will have to ensure that on the client side there is no concurrent requests from the same address, in a way similar to https://github.com/ethereum/go-ethereum/blob/master/internal/ethapi/api.go#L351-L352, inside of this lock we will do: compose tx -> sign locally -> send raw tx.

  2. On the side of geth we will get nonce the same way it is done for sendTransaction https://github.com/ethereum/go-ethereum/blob/master/internal/ethapi/api.go#L1098

Additionally, compose tx rpc call reduces number of network round trips required to send a transaction, currently we have to: estimate gas, get a gas price, get a nonce and then send tx. This is not critical but would be nice.

GitHub
go-ethereum - Official Go implementation of the Ethereum protocol
GitHub
go-ethereum - Official Go implementation of the Ethereum protocol

@holiman
Copy link

holiman commented Jan 23, 2018

@holiman i forgot about another part of this problem. rpc call getTransactionCount doesn't include transaction queue when it gets a nonce. You can find more info about it here ethereum/go-ethereum#15794

But what I'm saying is that even if it did, it still wouldn't solve the problem. Because even after you get this value, there wll be a gap while you're signing it and haven't submitted it yet, when another tx would get the same nonce.

@holiman
Copy link

holiman commented Jan 23, 2018

So what I'd do, if I were you, would be to:

  • Have a singleton/whatever which deals out nonces. This should be multithread-safe, and your only source of nonces.
  • Get gasprice once every block or so, maximum.
  • Whenever you need to make a tx,
    • Create a tx, fill with gasPrice and relevant nonce. if you don't have it yet, call getTransactionCount, then remember it locally.
    • Call estimateGas for the tx, set gas
    • Sign tx
    • Submit tx

@dshulyak
Copy link
Contributor Author

So what I'd do, if I were you, would be to:

in general this is exactly what I am trying to achieve with this PR

But what I'm saying is that even if it did, it still wouldn't solve the problem. Because even after you get this value, there wll be a gap while you're signing it and haven't submitted it yet, when another tx would get the same nonce.

what i tried to say earlier is that we can avoid doing any compose tx calls before tx from previous compose call is sent to geth

@holiman
Copy link

holiman commented Jan 23, 2018

what i tried to say earlier is that we can avoid doing any compose tx calls before tx from previous compose call is sent to geth

Yes, that would work then. But still it would be problematic (from a user-expectation perspective) to have to have the bug reports over and over again, and tell users "compose works 95%, unless you do it multithreaded, when you'll get nonce-collisions. Please use sendTransaction, which works 100%". So I don't personally think it's a really good addition, but that's just my 5 c.

Copy link
Contributor

@mandrigin mandrigin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! We got rid of one patch! 🎉

Please, make sure that go-ethereum is updated with that.

Looks good, a few nitpicks.

if err != nil {
m.txDone(tx, hash, err)
return hash, err
}
// Send the transaction finally.
hash, err = m.completeTransaction(tx, account, password)
hash, err = m.completeTransaction(config, account, tx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I'd still put tx on a first place.

log.Info("complete transaction", "id", queuedTx.ID)
// nonce should be incremented only if tx completed without error
// if upstream node returned nonce higher than ours we will stick to it
localNonce := m.addrLock.LockNonceFor(queuedTx.Args.From)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if it is impossible to lock nonce? a deadlock? an error?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will block until the previous completeTransaction() for this address is finished.

// node, after each call local nonce will be incremented
// then, we return higher nonce, as if another node was used to send 2 transactions
// upstream nonce will be equal to 5, we update our local counter to 5+1
func (s *TxQueueTestSuite) TestLocalNone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: TestLocalNonce

Copy link
Contributor

@adambabik adambabik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The whole idea looks good but I would work on the AddrLocker interface. Maybe we can expose only Lock() and Unlock() methods? It's a bit strange that it says LockNonce as we actually lock an address and additionally its local cached nonce.

l.lock(address).Lock()
return l.localNonce[address]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition? l.localNonce[address] is not guarded with a mutex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

l.lock(address).Lock() ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, my bad. In this case, it's safe.

}

// lock returns the lock of the given address.
func (l *AddrLocker) lock(address common.Address) *sync.Mutex {
l.mu.Lock()
defer l.mu.Unlock()
if l.localNonce == nil {
l.localNonce = map[common.Address]uint64{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why make is not used as it is three lines below?

// same nonce until the lock is released. The mutex prevents the (an identical nonce) from
// being read again during the time that the first transaction is being signed.
func (l *AddrLocker) LockAddr(address common.Address) {
func (l *AddrLocker) LockNonceFor(address common.Address) uint64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe LockAddrAndReturnLocalNonce()? As it's a bit hard to figure out if this method is just a getter or has some side-effects.

Or maybe just go with Lock()? I have some trouble with an idea of locking nonce as we actually lock an address and additionally keep track of its local nonce.

func (l *AddrLocker) UnlockAddr(address common.Address) {
// UnlockNonceFor unlocks the mutex of the given account and updates nonce.
func (l *AddrLocker) UnlockNonceFor(address common.Address, nonce uint64) {
l.localNonce[address] = nonce
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, it depends on the usage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So like an ordinary race condition :D AddrLocker should be immune to bad usage in my opinion. Ideally, it could verify if a given address is actually locked. If not, an error should be returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifying locked address in golang is not that easy, it requires one more synchronization mechanism like Atomic.CompareAndSwap

log.Info("finally completed transaction", "id", tx.ID, "hash", hash, "err", err)
m.txDone(tx, hash, err)
return hash, err
}

func (m *Manager) validateAccount(tx *common.QueuedTx) (*common.SelectedExtKey, error) {
func (m *Manager) validateAccount(config *params.NodeConfig, tx *common.QueuedTx, password string) (*common.SelectedExtKey, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't get config from Manager?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mainly to avoid multiple calls in mocks

// UnlockAddr unlocks the mutex of the given account.
func (l *AddrLocker) UnlockAddr(address common.Address) {
// UnlockNonceFor unlocks the mutex of the given account and updates nonce.
func (l *AddrLocker) UnlockNonceFor(address common.Address, nonce uint64) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here maybe we can also go with Unlock() as we again kind of locking an address, not a nonce.

log.Info("complete transaction", "id", queuedTx.ID)
// nonce should be incremented only if tx completed without error
// if upstream node returned nonce higher than ours we will stick to it
localNonce := m.addrLock.LockNonceFor(queuedTx.Args.From)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will block until the previous completeTransaction() for this address is finished.

localNonce := m.addrLock.LockNonceFor(queuedTx.Args.From)
var nonce uint64
defer func() {
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is ok but usually, a special case happens when err != nil.

rst := manager.WaitForTransaction(tx)
// simple sanity checks
s.NoError(err)
s.Equal(rst.Error, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s.NoError(rst.Error) maybe?

// node, after each call local nonce will be incremented
// then, we return higher nonce, as if another node was used to send 2 transactions
// upstream nonce will be equal to 5, we update our local counter to 5+1
func (s *TxQueueTestSuite) TestLocalNone() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks good! I would also add a fail case where it is verified that nonce was not incremented.

@dshulyak
Copy link
Contributor Author

dshulyak commented Feb 5, 2018

@adambabik I don't like this interface either. Would it make sense If I will simply store a map on a transactionManager, guarded by AddrLocker?

@dshulyak dshulyak force-pushed the localnonce branch 2 times, most recently from 649af41 to f0e1429 Compare February 5, 2018 16:36
@dshulyak
Copy link
Contributor Author

dshulyak commented Feb 5, 2018

I removed changes to AddrLocker, now it looks simpler. Will do a rebase tmrw

Copy link
Contributor

@adambabik adambabik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner interface with nonces put in txqueue_manager.go!


func (m *Manager) completeTransaction(config *params.NodeConfig, selectedAccount *common.SelectedExtKey, queuedTx *common.QueuedTx) (hash gethcommon.Hash, err error) {
log.Info("complete transaction", "id", queuedTx.ID)
// nonce should be incremented only if tx completed without error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe put this comment inside defer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, will do it with rebase

notify bool
completionTimeout time.Duration

addrLock *AddrLocker
localNonce map[gethcommon.Address]uint64
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is actually not thread safe now

If multiple concurrent transactions are sent from the same address
nonce for some of them will be the same. It will result in the
"known transaction error" and such transaction will be dropped
from a queue.

There is 2 alternatives in solving such problem:
- adding an auto-retry to transaction queue, which is very complex
  and error prone, this way we can actually retry transactions
  that are already processed by ethereum
- maintain nonce locally

Second approach is a straightforward. We keep asking for a nonce
from upstream, but if our local nonce is higher we steak to it.
Our local nonce is updated only if transaction succeeds, so there is
no way to send out of order transaction.

Signed-off-by: Dmitry Shulyak <yashulyak@gmail.com>
@dshulyak
Copy link
Contributor Author

dshulyak commented Feb 6, 2018

PR in status-go/go-ethereum status-im/go-ethereum#41

@mandrigin
Copy link
Contributor

@dshulyak merged
please do dep ensure --update github.com/ethereum/go-ethereum in this branch and commit the changes.

@dshulyak
Copy link
Contributor Author

dshulyak commented Feb 6, 2018

@mandrigin thanks, updated

@mandrigin
Copy link
Contributor

@divan it would be cool if you can take a look at this PR...

- [`0008-tx-pool-nonce.patch`](./0008-tx-pool-nonce.patch) - On GetTransactionCount request with PendingBlockNumber get the nonce from transaction pool
- [`0010-geth-17-fix-npe-in-filter-system.patch`](./0010-geth-17-fix-npe-in-filter-system.patch) - Temp patch for 1.7.x to fix a NPE in the filter system

- [`0010-geth-17-fix-npe-in-filter-system.patch`](./0010-geth-17-fix-npe-in-filter-system.patch) - Temp patch for 1.7.x to fix a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix what? 😅 What is npe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null pointer exception

}
// 5 transactions in total
for i := 0; i < txCount+2; i++ {
s.setupStatusBackend(account, password, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange we call s.setupStatusBackend(account, password, nil) five times with the same args. Maybe we can clarify this in the comment: setup expectations for 5 transactions in total?

Copy link
Contributor

@divan divan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@divan divan merged commit d0ef64a into status-im:develop Feb 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants