eth/protocols/snap: snap sync testing #22179

holiman · 2021-01-15T14:55:54Z

This is a work in progress - the tests now just hang, which isn't ideal. However, the tests do enable some fairly deep testing of the snap protocol, and the reason I rebased it again and am working on it, is to see if it can be used to trigger #22172.

Also, in general, I think it would be good to have these kinds of fairly high-level protocol tests.

holiman · 2021-01-17T17:54:21Z

I think this shows two errors, or, maybe let's call them "discussion points".

If cancel is invoked, the Sync does not exit with an error, but just exits with nil. Which seems strange, since the sync didn't complete successfully.
If a remote part is delivering storage tries in very small increments, the requestor eventually stalls. There are still storage tasks, however, the snippet below causes them to not be retrieved, for some reason:

		// Skip tasks that are already retrieving (or done with) all small states
		if len(task.SubTasks) == 0 && len(task.stateTasks) == 0 {
			continue
		}

The current implementation, when delivering storage keys, treats the limit a bit oddly. The server part aborts after a key has been added which goes above the limit. If a server instead respects the limit, the recipient complains that there are more elements available.

holiman · 2021-01-19T10:20:09Z

I pushed another fix, now there's only one remaining stall, in TestSyncWithStorageAndOneCappedPeer, where one peer delivers storage items, but only delivers it in max 500 bytes at a time.

holiman · 2021-01-19T10:24:46Z

@karalabe in 01e4846 , I added revertals to storage requests when they fail due to bad proofs. The testcases in this PR hit both those clauses individually, hence why they were added.

However, the same scenario applies for all types of requests, code, trie etc. Should we add revertals for all other cases when a response is invalid? And if so, maybe we should do it in a more generic fashion, instead of adding them one by one?

holiman · 2021-01-19T18:33:41Z

Another question.
In most places in sync.go, we update the bloom filter at the same time we write batch data to the database.

We do it in

processStorageResponse
processBytecodeResponse
forwardAccountTask

We do not do it in

processTrienodeHealResponse
processBytecodeHealResponse

karalabe · 2021-01-20T10:52:06Z

eth/protocols/snap/sync.go

 	size := common.StorageSize(len(hashes) * common.HashLength)
 	for _, account := range accounts {
 		size += common.StorageSize(len(account))
 	}
 	for _, node := range proof {
 		size += common.StorageSize(len(node))
 	}
-	logger := peer.logger.New("reqid", id)
+	logger := peer.Log().New("reqid", peer.ID())


This is wrong, pls revert. We need the request id (in id), not the peer's id.

holiman · 2021-01-20T14:54:56Z

Last remaining blocker:

$ go test . -run TriePanic
TRACE[01-20|15:53:31.702] Fetching range of accounts               id=nice-a reqid=4037200794235010051 root="c2661f…b0463a" origin="000000…000000" limit="0fffff…ffffff" bytes=512.00KiB
TRACE[01-20|15:53:31.702] Delivering range of accounts             id=nice-a reqid=4037200794235010051 hashes=1 accounts=1 proofs=1 bytes=210.00B
panic: interface conversion: trie.node is nil, not *trie.fullNode

goroutine 20 [running]:
github.com/ethereum/go-ethereum/trie.unsetInternal(0xa5c2a0, 0xc00007ed70, 0xc00002b9a0, 0x41, 0x41, 0xc00002b9f0, 0x41, 0x41, 0x20, 0xa5c1a0)
	/home/user/go/src/github.com/ethereum/go-ethereum/trie/proof.go:302 +0xd65
github.com/ethereum/go-ethereum/trie.VerifyRangeProof(0x6657bd231f66c2, 0x723c83101df021d9, 0xea1a61864a975b5e, 0x3a46b00c81ac76fa, 0xc00014a630, 0x20, 0x20, 0xc000352400, 0x20, 0x20, ...)
	/home/user/go/src/github.com/ethereum/go-ethereum/trie/proof.go:563 +0x7e5
github.com/ethereum/go-ethereum/eth/protocols/snap.(*Syncer).OnAccounts(0xc000147b00, 0xa62960, 0xc00016a3f0, 0x380704bb7b4d7c03, 0xc0003522a0, 0x1, 0x1, 0xc00034c140, 0x1, 0x1, ...)
	/home/user/go/src/github.com/ethereum/go-ethereum/eth/protocols/snap/sync.go:2033 +0x876
github.com/ethereum/go-ethereum/eth/protocols/snap.defaultAccountRequestHandler(0xc00016a3f0, 0x380704bb7b4d7c03, 0x6657bd231f66c2, 0x723c83101df021d9, 0xea1a61864a975b5e, 0x3a46b00c81ac76fa, 0x0, 0x0, 0x0, 0x0, ...)
	/home/user/go/src/github.com/ethereum/go-ethereum/eth/protocols/snap/sync_test.go:220 +0x11b
created by github.com/ethereum/go-ethereum/eth/protocols/snap.(*testPeer).RequestAccountRange
	/home/user/go/src/github.com/ethereum/go-ethereum/eth/protocols/snap/sync_test.go:162 +0x365

.. For now. And the question about the bloom filter above

holiman · 2021-01-20T17:26:00Z

@rjl493456442 do you have any good ideas on what the correct fix for the prover would be? The case is basically a very very small trie, consisting of a shortnode as root, iiuc.

MariusVanDerWijden · 2021-01-22T08:16:16Z

eth/protocols/snap/sync.go

@@ -622,6 +629,7 @@ func (s *Syncer) loadSyncStatus() {
 				log.Debug("Scheduled account sync task", "from", task.Next, "last", task.Last)
 			}
 			s.tasks = progress.Tasks
+			s.snapped = len(s.tasks) == 0


This might be a dumb comment, but everywhere else we need to hold a lock for accessing s especially s.snapped. The only instance I can find where loadSyncStatus() is called is in Line 528, but we don't hold the lock there anymore

Yeah it's a bit funky. loadSyncStatus touches a lot of internals in an unsafe way, and right after, the code accesses s.tasks.

s.loadSyncStatus() if len(s.tasks) == 0 && s.healer.scheduler.Pending() == 0 {

So probably that lock-holding should be extended to cover that portion too.

I don't think it's a real issue, however, because I don't think we enter there in a concurrent way.

AFAIK loadSyncStatus is only called when you create the thing ,so there's no concurrency at that point.

…rder

holiman · 2021-01-22T09:34:02Z

Rebased on top of @rjl493456442 's panic fix

karalabe

SGTM, although would be nice to look into how we could speed it up, .Currently it's enormously slow, even with parallel execution.

* eth/protocols/snap: make timeout configurable * eth/protocols/snap: snap sync testing * eth/protocols/snap: test to trigger panic * eth/protocols/snap: fix race condition on timeouts * eth/protocols/snap: return error on cancelled sync * squashme: updates + test causing panic + properly serve accounts in order * eth/protocols/snap: revert failing storage response * eth/protocols/snap: revert on bad responses (storage, code) * eth/protocols/snap: fix account handling stall * eth/protocols/snap: fix remaining revertal-issues * eth/protocols/snap: timeouthandler for bytecode requests * eth/protocols/snap: debugging + fix log message * eth/protocols/snap: fix misspelliings in docs * eth/protocols/snap: fix race in bytecode handling * eth/protocols/snap: undo deduplication of storage roots * synctests: refactor + minify panic testcase * eth/protocols/snap: minor polishes * eth: minor polishes to make logs more useful * eth/protocols/snap: remove excessive logs from the test runs * eth/protocols/snap: stress tests with concurrency * eth/protocols/snap: further fixes to test cancel channel handling * eth/protocols/snap: extend test timeouts on CI Co-authored-by: Péter Szilágyi <peterke@gmail.com>

ligi mentioned this pull request Jan 18, 2021

Enable snap sync by default #22139

Closed

5 tasks

holiman force-pushed the sync-testing-b branch from a90a015 to 31c5d21 Compare January 18, 2021 12:23

karalabe reviewed Jan 20, 2021

View reviewed changes

holiman force-pushed the sync-testing-b branch from f26251b to f85a106 Compare January 20, 2021 14:52

rjl493456442 mentioned this pull request Jan 22, 2021

trie: fix range prover #22210

Merged

MariusVanDerWijden reviewed Jan 22, 2021

View reviewed changes

holiman added 16 commits January 22, 2021 10:14

eth/protocols/snap: make timeout configurable

4a878d1

eth/protocols/snap: snap sync testing

9d8994c

eth/protocols/snap: test to trigger panic

935c001

eth/protocols/snap: fix race condition on timeouts

375588b

eth/protocols/snap: return error on cancelled sync

404b299

squashme: updates + test causing panic + properly serve accounts in o…

f191848

…rder

eth/protocols/snap: revert failing storage response

dc21744

eth/protocols/snap: revert on bad responses (storage, code)

09d0610

eth/protocols/snap: fix account handling stall

f963041

eth/protocols/snap: fix remaining revertal-issues

cf8b96b

eth/protocols/snap: timeouthandler for bytecode requests

6944299

eth/protocols/snap: debugging + fix log message

b86bd4b

eth/protocols/snap: fix misspelliings in docs

f358473

eth/protocols/snap: fix race in bytecode handling

ac7c967

eth/protocols/snap: undo deduplication of storage roots

a8e39d1

synctests: refactor + minify panic testcase

be0ab3b

holiman force-pushed the sync-testing-b branch from 0c3df19 to be0ab3b Compare January 22, 2021 09:14

holiman marked this pull request as ready for review January 22, 2021 09:33

holiman requested a review from rjl493456442 as a code owner January 22, 2021 09:33

karalabe added 2 commits January 24, 2021 21:20

eth/protocols/snap: minor polishes

324b8a6

eth: minor polishes to make logs more useful

88dc6ee

karalabe force-pushed the sync-testing-b branch from caf1e3c to 88dc6ee Compare January 24, 2021 19:46

karalabe added 4 commits January 24, 2021 21:52

eth/protocols/snap: remove excessive logs from the test runs

4aeb629

eth/protocols/snap: stress tests with concurrency

a6a131f

eth/protocols/snap: further fixes to test cancel channel handling

00b8129

eth/protocols/snap: extend test timeouts on CI

2aafa6a

karalabe added this to the 1.10.0 milestone Jan 25, 2021

karalabe approved these changes Jan 25, 2021

View reviewed changes

karalabe merged commit 797b081 into ethereum:master Jan 25, 2021

holiman mentioned this pull request Feb 3, 2021

Snap sync crash #22172

Closed

karalabe mentioned this pull request Feb 16, 2021

eth: fix snap sync cancellation #22334

Merged

quorumbot mentioned this pull request Sep 3, 2021

[Upgrade] Go-Ethereum release v1.10.0 Consensys/quorum#1249

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

eth/protocols/snap: snap sync testing #22179

eth/protocols/snap: snap sync testing #22179

holiman commented Jan 15, 2021

holiman commented Jan 17, 2021 •

edited

Loading

holiman commented Jan 19, 2021

holiman commented Jan 19, 2021

holiman commented Jan 19, 2021

karalabe Jan 20, 2021

holiman commented Jan 20, 2021

holiman commented Jan 20, 2021

MariusVanDerWijden Jan 22, 2021

holiman Jan 22, 2021

karalabe Jan 22, 2021

holiman commented Jan 22, 2021

karalabe left a comment

eth/protocols/snap: snap sync testing #22179

eth/protocols/snap: snap sync testing #22179

Conversation

holiman commented Jan 15, 2021

holiman commented Jan 17, 2021 • edited Loading

holiman commented Jan 19, 2021

holiman commented Jan 19, 2021

holiman commented Jan 19, 2021

karalabe Jan 20, 2021

Choose a reason for hiding this comment

holiman commented Jan 20, 2021

holiman commented Jan 20, 2021

MariusVanDerWijden Jan 22, 2021

Choose a reason for hiding this comment

holiman Jan 22, 2021

Choose a reason for hiding this comment

karalabe Jan 22, 2021

Choose a reason for hiding this comment

holiman commented Jan 22, 2021

karalabe left a comment

Choose a reason for hiding this comment

holiman commented Jan 17, 2021 •

edited

Loading