You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I use Durability::None, that bloats the disk size, which already caused me to run of disk space more than once, so I started doing an Immediate flush here and there:
// Note the keys are written in monotonically increasing order.for x in0..item_count {
db.insert(
key,
value,// NOTE: Avoid too much disk space building up...
args.backend == Backend::Redb && x % 100_000 == 0,);}
With the above code, writing 100M 16 byte keys and 64 byte values takes 130 minutes, which is about 78µs per insert, which is slower than the fsync time of my SSD (pm9a3), so it looks like there is no advantage to write with Durability=None. So is there any point in using None at all?
Additionally, for this comparatively small data set (it's just ~8 GB of user data), redb has written 4.4 TB (write amp = 540), with the resulting .redb file being ~28GB.
As a comparison
sled 0.x takes 5 minutes, and comparable disk space
fjall takes 3 minutes, and uses 8 GB (to be expected, because LSM)
What is the best way to write a lot of KVs without bloating disk space, while keeping inserts somewhat fast?
The text was updated successfully, but these errors were encountered:
If you're able to insert them in sorted order, that might improve write amplification. Alternately, you can adjust the cache size, if you have enough RAM
I am trying to benchmark large data sets in https://github.com/marvin-j97/rust-storage-bench, so I want to load a lot of data very quickly, no matter the durability.
If I use
Durability::None
, that bloats the disk size, which already caused me to run of disk space more than once, so I started doing an Immediate flush here and there:With the above code, writing 100M 16 byte keys and 64 byte values takes 130 minutes, which is about 78µs per insert, which is slower than the fsync time of my SSD (pm9a3), so it looks like there is no advantage to write with Durability=None. So is there any point in using None at all?
Additionally, for this comparatively small data set (it's just ~8 GB of user data), redb has written 4.4 TB (write amp = 540), with the resulting .redb file being ~28GB.
As a comparison
What is the best way to write a lot of KVs without bloating disk space, while keeping inserts somewhat fast?
The text was updated successfully, but these errors were encountered: