Investigate and document the most efficient write pattern for Level DB #13

p0mvn · 2022-01-29T20:54:10Z

Background

In #5 , we introduced unsaved additions and unsaved removals in mutable_tree.go. Since the fast index, introduces additional overhead on writes (which is normal for indexes), we would like to ensure that writes are being committed in the most optimal way

Currently, writes to disk work in the following way (in SaveVersion):

unsaved additions get sorted by key, then added to goleveldb batch
unsaved removals get sorted by key, then added to goleveldb batch
goleveldb batch gets written to disk

When we batch operations, goleveldb simply appends them to the batch. Then, on commit, goleveldb batch sequentially applies them. Therefore, the above write pattern might not be the most efficient. We should investigate and document what write pattern would be the most efficient.

Potential alternatives:

apply removals first, then additions
sort and merge removals with additions and apply them together

Acceptance Criteria

investigate and document how leveldb works
analyze the proposed alternatives

The text was updated successfully, but these errors were encountered:

p0mvn · 2022-01-29T21:07:21Z

LevelDB deletions do not actually get deleted right away. They write tombstones, and the actual deletions only happen on compaction.

p0mvn · 2022-01-29T21:28:52Z

Records are applied sequentially:

https://github.com/syndtr/goleveldb/blob/2ae1ddf74ef7020251ff1ff0fe8daac21a157761/leveldb/db_write.go#L256-L258

p0mvn · 2022-01-29T22:11:24Z

LevelDb is based on LSM-tree out of SSTables

Sorted String Table (SSTable)
- Sequence of key value pairs sorted by keys
- Each key appears once within each merged segment file
- Merging is efficient
  - Similar to merge sort algorithm
  - All the values in one segment is more recent than all the values in the other segment in case of collisions
- Don’t need to keep an index of all the keys in memory
  - Can have the offset of keys around the key you are looking for so you can jump closer and scan from there
- Since read requests need to scan over several key-value pairs in the requested range, it is possible to group records and compress before writing to disk
  - Each entry points at the start of the compressed block
  - Reduces I/O bandwidth use

Constructing and Maintaining SSTables
Maintain a sorted structure in memory (memtable) by using:

AVL Tree
Red-black tree
Write comes in -> add to memtable
- Memtable gets bigger than threshold -> write it out to disk as an SSTable
- Already sorted by key
Read comes in -> check memtable, then segments
Concurrently, run merging and compaction process in the background

Leveled Compaction

Key range is split up into smaller SSTables and older data is moved into separate “levels”, which allows the compaction to proceed more incrementally and use less disk space
Fixed-size tables (e.g. 5MB)
Within each level, guaranteed to be non-overlapping
Each level is ten times as large as the previous
More on this:
- https://stackoverflow.com/questions/29766453/how-does-the-leveled-compaction-strategy-ensure-90-of-reads-are-from-one-sstabl
- https://github.com/google/leveldb/blob/master/doc/impl.md

p0mvn · 2022-01-29T22:13:34Z

Memtable should already be sorted. Therefore, we should check if it is necessary to sort fast node additions and removals in iavl tree before writing them to the db. Testing:

Benchmark with sorting:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    8692	    165474 ns/op	   27524 B/op	     342 allocs/op
Init Tree took 66.91 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    5236	    222130 ns/op	   46814 B/op	     458 allocs/op
PASS

Benchmark without sorting:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    6158	    180032 ns/op	   29066 B/op	     359 allocs/op
Init Tree took 67.13 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    5455	    306050 ns/op	   48218 B/op	     480 allocs/op
PASS

Looks like the sorting in iavl does help.

Potential explanation:

The max default size of the memtable is 4MB which is 4 * 1024 * 1024 = 4,194,304 bytes

The benchmark was done on 100,000 key-value pairs of 56 bytes = 5,600,000 bytes

Therefore, level db would have to create one SSTable on disk after memtable fills up. After SSTable is written, a new memtable is created with extra data. Both are intra sorted but not inter sorted. Therefore we can conclude that sorting is indeed necessary for good performance with a large set of key-value pairs that outgrows the default size of memtable.

p0mvn · 2022-01-29T23:13:23Z

Benchmark for writing removals first and only then additions:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    9754	    165586 ns/op	   27756 B/op	     345 allocs/op
Init Tree took 66.93 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    4942	    235430 ns/op	   47157 B/op	     460 allocs/op

Has only made it worse than before

p0mvn · 2022-01-29T23:14:07Z

Combining them together and writing at once has also made the performance worse:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    9081	    176331 ns/op	   27623 B/op	     344 allocs/op
Init Tree took 66.85 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    4107	    433346 ns/op	   50517 B/op	     472 allocs/op

p0mvn · 2022-01-29T23:15:01Z

Based on the investigation above, my guesses were incorrect. The current write pattern introduced in #5 are the most efficient out of all proposed solutions

p0mvn closed this as completed Jan 29, 2022

p0mvn mentioned this issue Jan 29, 2022

Fast storage optimization for queries and iterations #5

Merged

27 tasks

p0mvn linked a pull request Jan 29, 2022 that will close this issue

Fast storage optimization for queries and iterations #5

Merged

27 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and document the most efficient write pattern for Level DB #13

Investigate and document the most efficient write pattern for Level DB #13

p0mvn commented Jan 29, 2022 •

edited

Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022 •

edited

Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022 •

edited

Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022

Investigate and document the most efficient write pattern for Level DB #13

Investigate and document the most efficient write pattern for Level DB #13

Comments

p0mvn commented Jan 29, 2022 • edited Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022 • edited Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022 • edited Loading

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022

p0mvn commented Jan 29, 2022 •

edited

Loading

p0mvn commented Jan 29, 2022 •

edited

Loading

p0mvn commented Jan 29, 2022 •

edited

Loading