Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds RocksDB support #2197

Merged
merged 14 commits into from
Aug 28, 2019
Merged

Adds RocksDB support #2197

merged 14 commits into from
Aug 28, 2019

Conversation

wezrule
Copy link
Contributor

@wezrule wezrule commented Aug 4, 2019

This PR allows the node to use RocksDB instead of LMDB for the ledger by setting the CMake flag -DNANO_ROCKSDB=ON (config/flag options will be added later). There is no upgrade path, so bootstrapping will have to be done from scratch.

RocksDB does not allow getting counts (only estimates), due to the way LSM trees work, multiple puts to the same key will result in duplications before the compaction phase. From: google/leveldb#119 it is recommended to store counts separately which I've done in a separate cached_counts table.

The write transactions used by RocksDB will fail on commit () if there is a conflicting key being modified. It is now a requirement to state which ones are modified during a tx_begin_write () call.
Each column family has a lock, and if there can be concurrent modifications of the same keys then they must be added to the "requires lock" collection, otherwise they can be added to the "no lock required" collection. This now requires some thought, and it is also more visible what transactions are modifying. Confirmation height is one such table which does not require a lock, even though there is concurrent modification of the table, only new ones are added in 1 thread, and modifications to existing ones in another thread. There may be other ones like this, but they have not been investigated yet (a pessimistic approach was taken).

Notes: When backing up using --snapshot CLI option, it is currently set up to do incremental backups, which reduces copying the need to copy the whole database. However if deleting the original files then the backup directory should also be deleted otherwise there can be inconsistencies (Put in documentation?)

RocksDB is an external dependency as there was no clear way to incorporate it in our build.
Windows:
Recommended way:

  • add set (VCPKG_LIBRARY_LINKAGE static) to the top of %VCPKG_DIR%\ports\rocksdb\portfile.cmake
  • vcpkg install rocksdb:x64-windows

Ubuntu:
apt-get install librocksdb-dev
apt-get install zlib1g-dev (may be required if system does not have zlib installed already)
Or

git clone https://github.com/facebook/rocksdb.git
cd rocksdb
make static_lib
make install

Note: If running into:
/usr/bin/ld: ../node/libnode.a(rocksdb.cpp.o):(.data.rel.ro._ZTIN7rocksdb11StackableDBE[_ZTIN7rocksdb11StackableDBE]+0x10): undefined reference to 'typeinfo for rocksdb::DB'
Then it may be necessary to build with USE_RTTI=1:
facebook/rocksdb#4329

Mac:
brew install rocksdb

For more instructions:
https://github.com/facebook/rocksdb/blob/master/INSTALL.md

The following CMake options can be used to specify where the RocksDB and zlib libraries are (another dependency):

ROCKSDB_INCLUDE_DIRS
ROCKSDB_LIBRARIES
ZLIB_LIBRARY
ZLIB_INCLUDE_DIR

I've noted some things that may/should be done on top of this:

  • Run TSAN/ASAN/Valgrind and if appropriate use suppressions (similarly to LMDB). TSAN/ASAN has been run on Clang/Ubuntu and there is 1 recurring TSAN error, however the same is reported here ThreadSanitizer reports data race in LevelDB during sync ethereum/aleth#4740 so I don't think there is necessarily anything to worry about. However I haven't been able to suppress it, but leaving that for another task.
  • Try WriteBatch instead of using write transactions (we do explicit locking ourselves)
  • Config options for user customizability. Memtable size, number of memtables, compaction threads, bloom filter memory usage etc.. Care must be taken to ensure backwards/forwards compatibility though.
  • RPC calls currently lock all tables, they should only lock necessary ones
  • Use read-modify-write merge operator available in RocksDB to update the new cached_counts table more efficiently without requiring separate get/put operations.
  • Debug mode on MSVC is very slow when creating column families, can anything be done about this?
  • Support in ci for running both LMDB/RocksDB tests (currently using only RocksDB)
  • Benchmarking/Profiling comparing to LMDB
  • Test snapshots using different RocksDB versions on another system
  • Try and reduce locking of column families if unnecessary.
  • Calling Delete has to do a get currently to make sure it exists, this may not be necessary in some places.
  • get always makes a copy
  • Some gets are one offs so they can be turned off for block caching.
  • Add db time tracker like we have for LMDB?
  • There is a way to merge results from 2 tables in RocksDB, rather than doing a lot of the stuff manually in rocksdb_iterator.hpp
  • Better suited options per column family, such as memtable size, compaction style etc.
  • Combine some of the smaller column families together and use prefix extraction to write/obtain the values.
  • Check if removing all peers in peers column family can be relegated to just MSVC/Debug again since all the new optimizations. Previously write commits also failed in ongoing_peer_store
  • Support automatic backups that were added in Provide optional automatic ledger/wallet backups before an upgrade #2198

12 threads 3.4GHZ CPU, 16GB RAM, SSD. Time taken (only taking into account bootstrapping):

--- LMDB RocksDB
Full sync (live) 3h 15m 4h 15m

1vCPU, 2GB RAM, SSD. (Note: There was 6tps going on during later half of the LMDB sync)

--- LMDB RocksDB
3.5M checked blocks 7h 15m 5h 30m

Comparison

LMDB RocksDB
Tested with the node for many years Unreleased
1 file (data.ldb) 100+ files
15GB live ledger 11GB (after flushing WAL)
Not many options to configure Very configurable
Many optimization made already by us Many optimizations can be made in the future to improve
Part of the build process Required external dep (incl recent version 5.13?+ which may not be available in your system repo)
More I/O Less I/O (writes are flushed in bulk)

Some test notes, try:
Run tests with LMDB & RocksDB tests
Test snapshot/vacuum with RocksDB compaction
RPC calls with RocksDB
CLI write commands, e.g peer_clear should give an appropriate error message if the node is already running.

@wezrule wezrule added enhancement documentation This item indicates the need for or supplies updated or expanded documentation incomplete This item is incomplete and should not be merged if it is a pull request database Relates to lmdb or rocksdb labels Aug 4, 2019
@wezrule wezrule added this to the V20.0 milestone Aug 4, 2019
@wezrule wezrule self-assigned this Aug 4, 2019
@wezrule wezrule removed the incomplete This item is incomplete and should not be merged if it is a pull request label Aug 9, 2019
Copy link
Contributor

@cryptocode cryptocode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tested with rocksdb 6.1.2 (brew)

Copy link
Contributor

@argakiig argakiig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, tested with vcpkg pulled rocksdb6.1.2 static with msvc-14.1x64

@SergiySW
Copy link
Contributor

LGTM, tested with rocksdb 6.2.2 static git build, Debian 10

@wezrule wezrule requested a review from SergiySW August 27, 2019 16:57
@wezrule wezrule merged commit bcb7f39 into nanocurrency:master Aug 28, 2019
@wezrule wezrule deleted the add_rocksdb_support branch August 28, 2019 07:08
@zhyatt zhyatt added the major This item indicates the need for or supplies a major or notable change label Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Relates to lmdb or rocksdb documentation This item indicates the need for or supplies updated or expanded documentation enhancement major This item indicates the need for or supplies a major or notable change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants