-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prefetch rpcdb iterator batches #1323
Conversation
} | ||
break | ||
} | ||
} | ||
} | ||
|
||
// Next attempts to move the iterator to the next element and returns if this | ||
// succeeded | ||
func (it *iterator) Next() bool { | ||
if it.db.closed.Get() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit weird... wondering if we should release all iterators when we close the DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They iterators are not usable once db is closed, so this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the reason this isn't done is because this would require the DatabaseClient
to actually track the outstanding iterator
s... Because the iterator already needs a reference to the DatabaseClient
this would introduce a kind of gross circular reference... I think keeping it as-is is probably the cleanest. We still require the iterator owner to call Release
anyway according to the interface... So I think this is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, left one comment on whether or not we should use a buffered channel to prefetch more
data []*rpcdbpb.PutRequest | ||
errs wrappers.Errs | ||
data []*rpcdbpb.PutRequest | ||
fetchedData chan []*rpcdbpb.PutRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use a buffered channel here instead to prefetch a bit more in the case that the caller ever consumes data more quickly than the time it takes for a get to occur?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could. The number of buffers here would impact how aggressively we prefetch... I didn't want to do too much here to avoid reading unnecessary data... but it's trivial to change if we felt it was worth it later
When running: it := db.NewIterator()
iterations := 0
lastUpdate := time.Now()
for it.Next() {
val := it.Value()
for i := 0; i < 10; i++ {
val = hashing.ComputeHash256(val)
}
_ = val
iterations++
now := time.Now()
if now.Sub(lastUpdate) > 10*time.Second {
log.Info("iterating", "count", iterations, "duration", common.PrettyDuration(now.Sub(startIteration)))
lastUpdate = now
}
} over the gRPC (in the C-chain) on top of a synced Fuji node. We see: Without the prefetching:
With the prefetching:
The reason some moderate hashing is done is to mimic doing some amount of work. Without this work the speed of iteration is solely based on how fast avalanchego can iterate over the DB (because the plugin just immediately drops the data, without giving prefetching any time to perform any parallel work). |
Why this should be merged
While processing a batch, the rpcdb client can asynchronously be fetching the next batch.
How this works
Because iterators are not thread safe, all of the operations are now performed on the fetch routine. On the fetch routine we will always attempt to ensure there is a batch ready to start being read.
How this was tested
CI. This should be tested better for performance and correctness.