Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate and document the most efficient write pattern for Level DB #13

Closed
2 tasks done
p0mvn opened this issue Jan 29, 2022 · 7 comments · Fixed by #5
Closed
2 tasks done

Investigate and document the most efficient write pattern for Level DB #13

p0mvn opened this issue Jan 29, 2022 · 7 comments · Fixed by #5

Comments

@p0mvn
Copy link
Member

p0mvn commented Jan 29, 2022

Background

In #5 , we introduced unsaved additions and unsaved removals in mutable_tree.go. Since the fast index, introduces additional overhead on writes (which is normal for indexes), we would like to ensure that writes are being committed in the most optimal way

Currently, writes to disk work in the following way (in SaveVersion):

  1. unsaved additions get sorted by key, then added to goleveldb batch
  2. unsaved removals get sorted by key, then added to goleveldb batch
  3. goleveldb batch gets written to disk

When we batch operations, goleveldb simply appends them to the batch. Then, on commit, goleveldb batch sequentially applies them. Therefore, the above write pattern might not be the most efficient. We should investigate and document what write pattern would be the most efficient.

Potential alternatives:

  • apply removals first, then additions
  • sort and merge removals with additions and apply them together

Acceptance Criteria

  • investigate and document how leveldb works
  • analyze the proposed alternatives
@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

LevelDB deletions do not actually get deleted right away. They write tombstones, and the actual deletions only happen on compaction.

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

LevelDb is based on LSM-tree out of SSTables

  • Sorted String Table (SSTable)
    • Sequence of key value pairs sorted by keys
    • Each key appears once within each merged segment file
    • Merging is efficient
      • Similar to merge sort algorithm
      • All the values in one segment is more recent than all the values in the other segment in case of collisions
    • Don’t need to keep an index of all the keys in memory
      • Can have the offset of keys around the key you are looking for so you can jump closer and scan from there
    • Since read requests need to scan over several key-value pairs in the requested range, it is possible to group records and compress before writing to disk
      • Each entry points at the start of the compressed block
      • Reduces I/O bandwidth use

Constructing and Maintaining SSTables
Maintain a sorted structure in memory (memtable) by using:

  • AVL Tree

  • Red-black tree

  • Write comes in -> add to memtable

    • Memtable gets bigger than threshold -> write it out to disk as an SSTable
    • Already sorted by key
  • Read comes in -> check memtable, then segments

  • Concurrently, run merging and compaction process in the background

Leveled Compaction

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

Memtable should already be sorted. Therefore, we should check if it is necessary to sort fast node additions and removals in iavl tree before writing them to the db. Testing:

Benchmark with sorting:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    8692	    165474 ns/op	   27524 B/op	     342 allocs/op
Init Tree took 66.91 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    5236	    222130 ns/op	   46814 B/op	     458 allocs/op
PASS

Benchmark without sorting:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    6158	    180032 ns/op	   29066 B/op	     359 allocs/op
Init Tree took 67.13 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    5455	    306050 ns/op	   48218 B/op	     480 allocs/op
PASS

Looks like the sorting in iavl does help.

Potential explanation:

The max default size of the memtable is 4MB which is 4 * 1024 * 1024 = 4,194,304 bytes

The benchmark was done on 100,000 key-value pairs of 56 bytes = 5,600,000 bytes

Therefore, level db would have to create one SSTable on disk after memtable fills up. After SSTable is written, a new memtable is created with extra data. Both are intra sorted but not inter sorted. Therefore we can conclude that sorting is indeed necessary for good performance with a large set of key-value pairs that outgrows the default size of memtable.

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

Benchmark for writing removals first and only then additions:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    9754	    165586 ns/op	   27756 B/op	     345 allocs/op
Init Tree took 66.93 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    4942	    235430 ns/op	   47157 B/op	     460 allocs/op

Has only made it worse than before

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

Combining them together and writing at once has also made the performance worse:

BenchmarkMedium/memdb-100000-100-16-40/update-8  	    9081	    176331 ns/op	   27623 B/op	     344 allocs/op
Init Tree took 66.85 MB
BenchmarkMedium/goleveldb-100000-100-16-40/update-8         	    4107	    433346 ns/op	   50517 B/op	     472 allocs/op

@p0mvn
Copy link
Member Author

p0mvn commented Jan 29, 2022

Based on the investigation above, my guesses were incorrect. The current write pattern introduced in #5 are the most efficient out of all proposed solutions

@p0mvn p0mvn closed this as completed Jan 29, 2022
@p0mvn p0mvn linked a pull request Jan 29, 2022 that will close this issue
27 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant