Prefetch rpcdb iterator batches #1323

StephenButtolph · 2023-04-11T18:00:40Z

Why this should be merged

While processing a batch, the rpcdb client can asynchronously be fetching the next batch.

How this works

Because iterators are not thread safe, all of the operations are now performed on the fetch routine. On the fetch routine we will always attempt to ensure there is a batch ready to start being read.

How this was tested

CI. This should be tested better for performance and correctness.

StephenButtolph · 2023-04-11T18:12:47Z

database/rpcdb/db_client.go

+			}
+			break
+		}
+	}
 }

 // Next attempts to move the iterator to the next element and returns if this
 // succeeded
 func (it *iterator) Next() bool {
 	if it.db.closed.Get() {


This seems a bit weird... wondering if we should release all iterators when we close the DB

They iterators are not usable once db is closed, so this makes sense to me.

So, the reason this isn't done is because this would require the DatabaseClient to actually track the outstanding iterators... Because the iterator already needs a reference to the DatabaseClient this would introduce a kind of gross circular reference... I think keeping it as-is is probably the cleanest. We still require the iterator owner to call Release anyway according to the interface... So I think this is fine.

joshua-kim

LGTM, left one comment on whether or not we should use a buffered channel to prefetch more

joshua-kim · 2023-04-18T20:20:33Z

database/rpcdb/db_client.go

-	data []*rpcdbpb.PutRequest
-	errs wrappers.Errs
+	data        []*rpcdbpb.PutRequest
+	fetchedData chan []*rpcdbpb.PutRequest


Should we use a buffered channel here instead to prefetch a bit more in the case that the caller ever consumes data more quickly than the time it takes for a get to occur?

We could. The number of buffers here would impact how aggressively we prefetch... I didn't want to do too much here to avoid reading unnecessary data... but it's trivial to change if we felt it was worth it later

StephenButtolph · 2023-04-18T22:11:07Z

When running:

it := db.NewIterator()
iterations := 0
lastUpdate := time.Now()
for it.Next() {
	val := it.Value()
	for i := 0; i < 10; i++ {
		val = hashing.ComputeHash256(val)
	}
	_ = val
	iterations++

	now := time.Now()
	if now.Sub(lastUpdate) > 10*time.Second {
		log.Info("iterating", "count", iterations, "duration", common.PrettyDuration(now.Sub(startIteration)))
		lastUpdate = now
	}
}

over the gRPC (in the C-chain) on top of a synced Fuji node. We see:

Without the prefetching:

INFO [04-18|18:06:17.487] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=2,127,336 duration=10.001s
INFO [04-18|18:06:27.488] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=4,290,049 duration=20.002s
INFO [04-18|18:06:37.488] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=6,421,562 duration=30.003s
INFO [04-18|18:06:47.489] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=8,574,370 duration=40.003s
INFO [04-18|18:06:57.489] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=10,729,053 duration=50.003s

With the prefetching:

INFO [04-18|18:04:22.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=3,718,763 duration=10.000s
INFO [04-18|18:04:32.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=7,461,150 duration=20.000s
INFO [04-18|18:04:42.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=11,142,776 duration=30.000s
INFO [04-18|18:04:52.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=14,858,890 duration=40.000s
INFO [04-18|18:05:02.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=18,565,672 duration=50.000s

The reason some moderate hashing is done is to mimic doing some amount of work. Without this work the speed of iteration is solely based on how fast avalanchego can iterate over the DB (because the plugin just immediately drops the data, without giving prefetching any time to perform any parallel work).

Prefetch rpcdb iterator batches

018b54a

StephenButtolph commented Apr 11, 2023

View reviewed changes

StephenButtolph added 4 commits April 11, 2023 14:24

oops

088596c

Merge branch 'dev' into prefetch-rpcdb-iteration-batches

bf3a984

Merge branch 'dev' into prefetch-rpcdb-iteration-batches

4cb71c4

add invariant comment

43faa0b

StephenButtolph marked this pull request as ready for review April 13, 2023 17:28

StephenButtolph self-assigned this Apr 13, 2023

StephenButtolph added the enhancement New feature or request label Apr 13, 2023

StephenButtolph added this to the v1.10.1 milestone Apr 13, 2023

Merge branch 'dev' into prefetch-rpcdb-iteration-batches

a77829f

joshua-kim approved these changes Apr 18, 2023

View reviewed changes

Merge branch 'dev' into prefetch-rpcdb-iteration-batches

dcfb8aa

darioush approved these changes Apr 18, 2023

View reviewed changes

hexfusion approved these changes Apr 18, 2023

View reviewed changes

StephenButtolph mentioned this pull request Apr 18, 2023

rpcdb iteration configs #1382

Open

StephenButtolph merged commit f2af48b into dev Apr 18, 2023

StephenButtolph deleted the prefetch-rpcdb-iteration-batches branch April 18, 2023 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prefetch rpcdb iterator batches #1323

Prefetch rpcdb iterator batches #1323

StephenButtolph commented Apr 11, 2023

StephenButtolph Apr 11, 2023

hexfusion Apr 11, 2023

StephenButtolph Apr 13, 2023

joshua-kim left a comment

joshua-kim Apr 18, 2023 •

edited

Loading

StephenButtolph Apr 18, 2023

StephenButtolph commented Apr 18, 2023

Prefetch rpcdb iterator batches #1323

Prefetch rpcdb iterator batches #1323

Conversation

StephenButtolph commented Apr 11, 2023

Why this should be merged

How this works

How this was tested

StephenButtolph Apr 11, 2023

Choose a reason for hiding this comment

hexfusion Apr 11, 2023

Choose a reason for hiding this comment

StephenButtolph Apr 13, 2023

Choose a reason for hiding this comment

joshua-kim left a comment

Choose a reason for hiding this comment

joshua-kim Apr 18, 2023 • edited Loading

Choose a reason for hiding this comment

StephenButtolph Apr 18, 2023

Choose a reason for hiding this comment

StephenButtolph commented Apr 18, 2023

joshua-kim Apr 18, 2023 •

edited

Loading