Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefetch rpcdb iterator batches #1323

Merged
merged 7 commits into from
Apr 18, 2023
Merged

Conversation

StephenButtolph
Copy link
Contributor

Why this should be merged

While processing a batch, the rpcdb client can asynchronously be fetching the next batch.

How this works

Because iterators are not thread safe, all of the operations are now performed on the fetch routine. On the fetch routine we will always attempt to ensure there is a batch ready to start being read.

How this was tested

CI. This should be tested better for performance and correctness.

}
break
}
}
}

// Next attempts to move the iterator to the next element and returns if this
// succeeded
func (it *iterator) Next() bool {
if it.db.closed.Get() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit weird... wondering if we should release all iterators when we close the DB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They iterators are not usable once db is closed, so this makes sense to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the reason this isn't done is because this would require the DatabaseClient to actually track the outstanding iterators... Because the iterator already needs a reference to the DatabaseClient this would introduce a kind of gross circular reference... I think keeping it as-is is probably the cleanest. We still require the iterator owner to call Release anyway according to the interface... So I think this is fine.

@StephenButtolph StephenButtolph marked this pull request as ready for review April 13, 2023 17:28
@StephenButtolph StephenButtolph self-assigned this Apr 13, 2023
@StephenButtolph StephenButtolph added the enhancement New feature or request label Apr 13, 2023
@StephenButtolph StephenButtolph added this to the v1.10.1 milestone Apr 13, 2023
Copy link
Contributor

@joshua-kim joshua-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, left one comment on whether or not we should use a buffered channel to prefetch more

data []*rpcdbpb.PutRequest
errs wrappers.Errs
data []*rpcdbpb.PutRequest
fetchedData chan []*rpcdbpb.PutRequest
Copy link
Contributor

@joshua-kim joshua-kim Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a buffered channel here instead to prefetch a bit more in the case that the caller ever consumes data more quickly than the time it takes for a get to occur?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could. The number of buffers here would impact how aggressively we prefetch... I didn't want to do too much here to avoid reading unnecessary data... but it's trivial to change if we felt it was worth it later

@StephenButtolph
Copy link
Contributor Author

When running:

it := db.NewIterator()
iterations := 0
lastUpdate := time.Now()
for it.Next() {
	val := it.Value()
	for i := 0; i < 10; i++ {
		val = hashing.ComputeHash256(val)
	}
	_ = val
	iterations++

	now := time.Now()
	if now.Sub(lastUpdate) > 10*time.Second {
		log.Info("iterating", "count", iterations, "duration", common.PrettyDuration(now.Sub(startIteration)))
		lastUpdate = now
	}
}

over the gRPC (in the C-chain) on top of a synced Fuji node. We see:

Without the prefetching:

INFO [04-18|18:06:17.487] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=2,127,336 duration=10.001s
INFO [04-18|18:06:27.488] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=4,290,049 duration=20.002s
INFO [04-18|18:06:37.488] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=6,421,562 duration=30.003s
INFO [04-18|18:06:47.489] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=8,574,370 duration=40.003s
INFO [04-18|18:06:57.489] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=10,729,053 duration=50.003s

With the prefetching:

INFO [04-18|18:04:22.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=3,718,763 duration=10.000s
INFO [04-18|18:04:32.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=7,461,150 duration=20.000s
INFO [04-18|18:04:42.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=11,142,776 duration=30.000s
INFO [04-18|18:04:52.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=14,858,890 duration=40.000s
INFO [04-18|18:05:02.934] <C Chain> github.com/ava-labs/coreth/plugin/evm/vm.go:567: iterating count=18,565,672 duration=50.000s

The reason some moderate hashing is done is to mimic doing some amount of work. Without this work the speed of iteration is solely based on how fast avalanchego can iterate over the DB (because the plugin just immediately drops the data, without giving prefetching any time to perform any parallel work).

@StephenButtolph StephenButtolph merged commit f2af48b into dev Apr 18, 2023
@StephenButtolph StephenButtolph deleted the prefetch-rpcdb-iteration-batches branch April 18, 2023 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants