Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A potential dead lock of BSC client which result in block syncing halt #58

Closed
unclezoro opened this issue Dec 27, 2020 · 0 comments
Closed

Comments

@unclezoro
Copy link
Collaborator

unclezoro commented Dec 27, 2020

System information

Geth version: 1.9.13
OS & Version: Linux

Expected behaviour

Client sync block normal

Actual behaviour

Client failed to sync block.

Steps to reproduce the behaviour

First change the code to make it easier to reproduce:

func (sub *Subscription) Unsubscribe() {
	sub.unsubOnce.Do(func() {
	uninstallLoop:
		for {
			select {
			case sub.es.uninstall <- sub.f:
                                // newly added line
				time.Sleep(3 * time.Second)
				break uninstallLoop
                        case ...
			}
		}
		<-sub.Err()
	})
}

Do request for multi times:

curl -X POST --data '{"jsonrpc":"2.0","method":"eth_newPendingTransactionFilter","params":[],"id":73}'

Backtrace

Here is the pprof of go routine:
goroutine.txt

  1. The lock here is not released:
    image

According to the pprof, the code is blocked on /server/bsc/event/feed.go:170.

1 @ 0x447b10 0x4573db 0x457eda 0x4b5630 0x6e90a4 0xcf0a5f 0xcf3800 0xcf2544 0xf68a60 0xf3aba5 0x475d61
#       0x457ed9        reflect.rselect+0x389                                                           /usr/lib/golang/src/runtime/select.go:542
#       0x4b562f        reflect.Select+0x16f                                                            /usr/lib/golang/src/reflect/value.go:2229
#       0x6e90a3        github.com/ethereum/go-ethereum/event.(*Feed).Send+0x593                        /server/bsc/event/feed.go:170
#       0xcf0a5e        github.com/ethereum/go-ethereum/core.(*BlockChain).writeBlockWithState+0xe2e    /server/bsc/core/blockchain.go:1470
#       0xcf37ff        github.com/ethereum/go-ethereum/core.(*BlockChain).insertChain+0x118f           /server/bsc/core/blockchain.go:1757
  1. The subscription may consume the channel so that block above select.
    As we checked, new NewPendingTransactionFilter calls are pending on fetching filtersMu lock.
#	0x4580a6	sync.runtime_SemacquireMutex+0x46									/usr/lib/golang/src/runtime/sema.go:71
#	0x49408b	sync.(*Mutex).lockSlow+0xfb										/usr/lib/golang/src/sync/mutex.go:138
#	0xf13256	sync.(*Mutex).Lock+0x296										/usr/lib/golang/src/sync/mutex.go:81
#	0xf130e1	github.com/ethereum/go-ethereum/eth/filters.(*PublicFilterAPI).NewPendingTransactionFilter.func1+0x121	/server/bsc/eth/filters/api.go:119
  1. The routine that holds filtersMu lock is timeoutLoop:
#	0xf142b3	github.com/ethereum/go-ethereum/eth/filters.(*Subscription).Unsubscribe.func1+0xf3	/server/bsc/eth/filters/filter_system.go:164
#	0x4944d2	sync.(*Once).doSlow+0xe2								/usr/lib/golang/src/sync/once.go:66
#	0xf0f704	sync.(*Once).Do+0x64									/usr/lib/golang/src/sync/once.go:57
#	0xf0f6bd	github.com/ethereum/go-ethereum/eth/filters.(*Subscription).Unsubscribe+0x1d		/server/bsc/eth/filters/filter_system.go:157
#	0xf09d47	github.com/ethereum/go-ethereum/eth/filters.(*PublicFilterAPI).timeoutLoop+0xf7		/server/bsc/eth/filters/api.go:88

The code is:

func (sub *Subscription) Unsubscribe() {
	sub.unsubOnce.Do(func() {
	uninstallLoop:
		for {
			// write uninstall request and consume logs/hashes. This prevents
			// the eventLoop broadcast method to deadlock when writing to the
			// filter event channel while the subscription loop is waiting for
			// this method to return (and thus not reading these events).
			select {
			case sub.es.uninstall <- sub.f:
				break uninstallLoop
			case <-sub.f.logs:
			case <-sub.f.hashes:
			case <-sub.f.headers:
			}
		}

		// wait for filter to be uninstalled in work loop before returning
		// this ensures that the manager won't use the event channel which
		// will probably be closed by the client asap after this method returns.
		<-sub.Err()
	})
}

Obvious it failed to do sub.es.uninstall <- sub.f.

  1. eventLoop is responsible for consuming the sub.es.uninstall channel.
    The routine is busy doing handleTxsEvent:
#	0xf1128b	github.com/ethereum/go-ethereum/eth/filters.(*EventSystem).handleTxsEvent+0x20b	/server/bsc/eth/filters/filter_system.go:350
#	0xf12f90	github.com/ethereum/go-ethereum/eth/filters.(*EventSystem).eventLoop+0xa60	/server/bsc/eth/filters/filter_system.go:461

It will push hashes to f.hashes channel, and now it is blocking by that channel.

func (es *EventSystem) handleTxsEvent(filters filterIndex, ev core.NewTxsEvent) {
	hashes := make([]common.Hash, 0, len(ev.Txs))
	for _, tx := range ev.Txs {
		hashes = append(hashes, tx.Hash())
	}
	for _, f := range filters[PendingTransactionsSubscription] {
		f.hashes <- hashes
	}
}
  1. What happened to f.hashes channel?
    That is because, in the above step 3, the Unsubscribe may not consume all the hashes and break out so that step4 stucked.

The deadlock dependency is:

Routine A:NewPendingTransactionFilter want to lock filtersMu to consume hashes;
Routine B: eventLoop is waiting Routine A to consume hashes so that it can push new hash to channel.
Routine C: Unsubscribe is holding lock filtersMu, but it is waiting for Routine B to consume uninstall channel.

@unclezoro unclezoro changed the title bsc dataseed sync stuck A potential dead lock of BSC client which result in block syncing halt Dec 30, 2020
pythonberg1997 pushed a commit to pythonberg1997/bsc that referenced this issue Mar 29, 2023
modify fork choice to prefer higher justified block number
galaio pushed a commit to galaio/bsc that referenced this issue Jul 31, 2024
galaio pushed a commit to galaio/bsc that referenced this issue Jul 31, 2024
buddh0 added a commit that referenced this issue Dec 9, 2024
* workflows: fix golangci-lint

* fix accounts

* fix core

* fix eth

* fix internal

* fix miner

* vix version

* use `make lint` to run cilint

* fix version

* fix cmd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant