-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core, eth, trie: write nodebuffer asynchronously to disk #28471
base: master
Are you sure you want to change the base?
Conversation
@rjl493456442 Can I invite you to help evaluate this PR? |
92e4c98
to
ad03bc6
Compare
e9d0659
to
288b3c9
Compare
I once had a similar idea, but I abandoned it due to certain limitations in LevelDB. Specifically, the following write operations will remain blocked even if we flush the buffer in the background. Nevertheless, it's certainly worth considering this approach within the context of Pebble. As you can see, it involves a non-trivial complexity. Do you have any performance data before we dive into details? |
Yes, writes limited by leveldb will still be blocked. Since asynchronous writing does not block the main process, subsequent block chasing performance is still much smoother due to cache hits, which is still recommended. Intel x86, 16 core, 64G, 3000 IOPS, 128M/S EC2 |
288b3c9
to
37ec06a
Compare
@rjl493456442 How do you think this PR is necessary? |
@joeylichang I will try to do some benchmarks first. |
Deployed on our benchmark machine, will post the result once it's available. |
cool~, looking forward to the result.
|
@rjl493456442 Can you also show the disk read/write metrics here? Maybe disk bandwidth is the bottleneck. |
After running a few more days, this pull request is consistently faster than master. Due to fact that accumulating 256 Megabytes state in memory will take roughly 1.5min in full sync, so the theoretical performance speedup is ~1%(average_buffer_flush_time / 1.5min). Although the speedup is not that obvious from the metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't really say much about the overall idea at this point, don't fully understand it yet. Some things looks a bit flaky to me.
@@ -151,16 +151,16 @@ func (db *Database) Commit(root common.Hash, report bool) error { | |||
// Size returns the storage size of diff layer nodes above the persistent disk | |||
// layer, the dirty nodes buffered within the disk layer, and the size of cached | |||
// preimages. | |||
func (db *Database) Size() (common.StorageSize, common.StorageSize, common.StorageSize) { | |||
func (db *Database) Size() (common.StorageSize, common.StorageSize, common.StorageSize, common.StorageSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the method docs. With four return values, might be time to split it into a bullet-list, e.g. something like
// Size returns the sizes of:
// - the diff layer nodes above the persistent disk layer,
// - the dirty nodes buffered within the disk layer,
// - the immutable nodes in the disk layer,
// - the cached preimages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
@@ -294,14 +337,15 @@ func (dl *diskLayer) setBufferSize(size int) error { | |||
} | |||
|
|||
// size returns the approximate size of cached nodes in the disk layer. | |||
func (dl *diskLayer) size() common.StorageSize { | |||
func (dl *diskLayer) size() (common.StorageSize, common.StorageSize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update method doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
trie/triedb/pathdb/disklayer.go
Outdated
// empty returns an indicator if trienodebuffer contains any state transition inside. | ||
empty() bool | ||
|
||
// getSize return the trienodebuffer used size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns two numbers, please document what they mean, respectively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the docs.
trie/triedb/pathdb/nodebuffer.go
Outdated
// asyncnodebuffer implement trienodebuffer interface, and aysnc the nodecache | ||
// to disk. | ||
type asyncnodebuffer struct { | ||
mux sync.RWMutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We typically name mutexes mu
rather than mux
. Not saying it's better, but please let's stick to the convention :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also document what the mutex covers. In this case, I assume it is used as a concurrency-guard for operations on both current
and background
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, change to mu
and, add docs for 'asyncnodebuffer' and 'flush' method
trie/triedb/pathdb/nodebuffer.go
Outdated
if atomic.LoadUint64(&a.background.immutable) == 1 { | ||
return nil | ||
} | ||
|
||
atomic.StoreUint64(&a.current.immutable, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look concurrency-safe: first you check, then you store (blindly). What you probably want to use here is CompareAndSwap
, and why not make the immutable
into an atomic.Bool
rather than using atomic.LoadUint64
?
It would become something like
if !a.background.immutable.CompareAndSwap(false, true) {
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic.Bool
is better. CompareAndSwap
is not appropriate, the background
switch 'currnet' the immutable
should be false.
trie/triedb/pathdb/nodebuffer.go
Outdated
size uint64 // The size of aggregated writes | ||
limit uint64 // The maximum memory allowance in bytes | ||
nodes map[common.Hash]map[string]*trienode.Node // The dirty node set, mapped by owner and path | ||
immutable uint64 // The flag equal 1, flush nodes to disk background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a bool
or even better atomic.Bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"The flag equal 1, flush nodes to disk background" -- please expand a bit on what the meaning of this field is.
For example (and I might be incorrect since I've still not fully understood this PR)
"If this is set to true, then some other thread is performing a flush in the background, and thus the ..."
or
"If this is set to true, then this nodebuffer is immutable and any write-operations to it will exit with error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
trie/triedb/pathdb/nodebuffer.go
Outdated
if atomic.LoadUint64(&b.immutable) == 1 { | ||
return errWriteImmutable | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is sufficient. What if someone is performing a commit
, so is in the for-loop below, and then some other threads sets the immutable
to true? This seems not thread-safe to me, but maybe I don't fully understand what the concurrency looks like.
@rjl493456442 Any further results? |
Pathdb will write trienodes to the disk in one batch after the
nodebuffer
is full. The operation of writing to the disk will block the execution of blocks, which will cause performance glitches from the monitoring point of view. This PR proposesasyncnodebuffer
to formally solve the performance glitch introduced by this.asyncnodebuffer
is an optimization ofnodebuffer
for asynchronous writing to disk. It consists of twonodebuffer
. When the current nodebuffer is full, it will become immutable and switch to the background to write to disk. The new frontnodebuffer
is used for subsequent writing of chasing blocks. Of course, the immutablenodebuffer
is still readable.