sync from facebook master #4

hliu18 · 2019-11-08T22:08:16Z

No description provided.

…#5679) Summary: This PR adds support in block cache trace analyzer to read from human readable trace file. This is needed when a user does not have access to the binary trace file. Pull Request resolved: #5679 Test Plan: USE_CLANG=1 make check -j32 Differential Revision: D16697239 Pulled By: HaoyuHuang fbshipit-source-id: f2e29d7995816c389b41458f234ec8e184a924db

Summary: Most existing RocksDB unit tests run on `Env::Default()`. It will be useful to port the unit tests to non-default environments, e.g. `HdfsEnv`, etc. This pull request is one step towards this goal. If RocksDB unit tests are built with a static library exposing a function `RegisterCustomObjects()`, then it is possible to implement custom object registrar logic in the library. RocksDB unit test can call `RegisterCustomObjects()` at the beginning. By default, `ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS` is not defined, thus this PR has no impact on existing RocksDB because `RegisterCustomObjects()` is a noop. Test plan (on devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 all $make check ``` All unit tests must pass. Pull Request resolved: #5676 Differential Revision: D16679157 Pulled By: riversand963 fbshipit-source-id: aca571af3fd0525277cdc674248d0fe06e060f9d

Summary: SmallestUnCommittedSeq reads two data structures, prepared_txns_ and delayed_prepared_. These two are updated in CheckPreparedAgainstMax when max_evicted_seq_ advances some prepared entires. To avoid the cost of acquiring a mutex, the read from them in SmallestUnCommittedSeq is not atomic. This creates a potential race condition. The fix is to read the two data structures in the reverse order of their update. CheckPreparedAgainstMax copies the prepared entry to delayed_prepared_ before removing it from prepared_txns_ and SmallestUnCommittedSeq looks into prepared_txns_ before reading delayed_prepared_. Pull Request resolved: #5683 Differential Revision: D16744699 Pulled By: maysamyabandeh fbshipit-source-id: b1bdb134018beb0b9de58827f512662bea35cad0

Summary: PR #5676 added some test coverage for `TEST_ENV_URI`, which unfortunately isn't supported in lite mode, causing some test failures for rocksdb lite. For example, ``` db/db_test_util.cc: In constructor ‘rocksdb::DBTestBase::DBTestBase(std::__cxx11::string)’: db/db_test_util.cc:57:16: error: ‘ObjectRegistry’ has not been declared Status s = ObjectRegistry::NewInstance()->NewSharedObject(test_env_uri, ^ ``` This PR fixes these errors by excluding the new code from test functions for lite mode. Pull Request resolved: #5686 Differential Revision: D16749000 Pulled By: miasantreble fbshipit-source-id: e8b3088c31a78b3dffc5fe7814261909d2c3e369

Summary: The changes transaction_test to set `txn_db_options.default_write_batch_flush_threshold = 1` in order to give better test coverage for WriteUnprepared. As part of the change, some tests had to be updated. Pull Request resolved: #5658 Differential Revision: D16740468 Pulled By: lth fbshipit-source-id: 3821eec20baf13917c8c1fab444332f75a509de9

Summary: With changes made in #5664 we meant to pass snap_released parameter of ::IsInSnapshot from the read callbacks. Although the variable was defined, passing it to the callback in WritePreparedTxnReadCallback was missing, which is fixed in this PR. Pull Request resolved: #5691 Differential Revision: D16767310 Pulled By: maysamyabandeh fbshipit-source-id: 3bf53f5964a2756a66ceef7c8f6b3ac75f102f48

Summary: When updating compiler version for MyRocks I'm seeing this error with rocksdb: ``` ome/yzha/mysql/mysql-fork2/rocksdb/table/get_context.h:91:3: error: explicitly defaulted default constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] GetContext() = default; ^ /home/yzha/mysql/mysql-fork2/rocksdb/table/get_context.h:166:18: note: default constructor of 'GetContext' is implicitly deleted because field 'tracing_get_id_' of const-qualified type 'const uint64_t' (aka 'const unsigned long') would not be initialized const uint64_t tracing_get_id_; ^ ``` The error itself is rather self explanatory and makes sense. Given that no one seems to be using the default ctor (they shouldn't, anyway), I'm deleting it. Pull Request resolved: #5685 Differential Revision: D16747712 Pulled By: yizhang82 fbshipit-source-id: 95c0acb958a1ed41154c0047d2e6fce7644de53f

…apshot (#5697) Summary: Currently, if a write is done without a snapshot, then `largest_validated_seq_` is set to `kMaxSequenceNumber`. This is too aggressive, because an iterator with a snapshot created after this write should be valid. Set `largest_validated_seq_` to `GetLastPublishedSequence` instead. The variable means that no keys in the current tracked key set has changed by other transactions since `largest_validated_seq_`. Also, do some extra cleanup in Clear() for safety. Pull Request resolved: #5697 Differential Revision: D16788613 Pulled By: lth fbshipit-source-id: f2aa40b8b12e0c0cf9e38c940fecc8f1cc0d2385

Summary: Fix the following clang analyze failures: ``` In file included from utilities/transactions/transaction_test.cc:8: ./utilities/transactions/transaction_test.h:174:14: warning: Attempt to delete released memory delete root_db; ^ ``` The destructor of StackableDB already deletes the root db and there is no need to delete the db separately. Pull Request resolved: #5700 Test Plan: USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j24 analyze Differential Revision: D16800579 Pulled By: maysamyabandeh fbshipit-source-id: 64c2d70f23e07e6a15242add97c744902ea33be5

Summary: In MyRocks, there are cases where we write while iterating through keys. This currently breaks WBWIIterator, because if a write batch flushes during iteration, the delta iterator would point to invalid memory. For now, fix by disallowing flush if there are active iterators. In the future, we will loop through all the iterators on a transaction, and refresh the iterators when a write batch is flushed. Pull Request resolved: #5699 Differential Revision: D16794157 Pulled By: lth fbshipit-source-id: 5d5bf70688bd68fe58e8a766475ae88fd1be3190

Summary: Some accesses to blob_files_ and open_ttl_files_ in BlobDBImpl, as well as to expiration_range_ in BlobFile were not properly synchronized. The patch fixes this and also makes sure the invariant that obsolete_files_ is a subset of blob_files_ holds even when an attempt to delete an obsolete blob file fails. Pull Request resolved: #5698 Test Plan: COMPILE_WITH_TSAN=1 make blob_db_test gtest-parallel --repeat=1000 ./blob_db_test --gtest_filter="*ShutdownWait*" The test fails with TSAN errors ~20 times out of 1000 without the patch but completes successfully 1000 out of 1000 times with the fix. Differential Revision: D16793235 Pulled By: ltamasi fbshipit-source-id: 8034b987598d4fdc9f15098d4589cc49cde484e9

Summary: Fix a bug in write unprepared savepoints. When flushing the write batch according to savepoint boundaries, we were forgetting to flush the last write batch after the last savepoint, meaning that some data was not written to DB. Also, add a small optimization where we avoid flushing empty batches. Pull Request resolved: #5703 Differential Revision: D16811996 Pulled By: lth fbshipit-source-id: 600c7e0e520ad7a8fad32d77e11d932453e68e3f

Summary: TSAN was not able to correctly instrument atomic bts and btr instructions, so when TSAN is enabled implement those with std::atomic::fetch_or and std::atomic::fetch_and. Also disable tests that fail on TSAN with false negatives (we know these are false negatives because this other verifiably correct program fails with the same TSAN error <link>) ``` make clean TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test ``` This is the code that fails with the same false-negative with TSAN ``` namespace { class ExceptionWithConstructionTrack : public std::exception { public: explicit ExceptionWithConstructionTrack(int id) : id_{folly::to<std::string>(id)}, constructionTrack_{id} {} const char* what() const noexcept override { return id_.c_str(); } private: std::string id_; TestConstruction constructionTrack_; }; template <typename Storage, typename Atomic> void transferCurrentException(Storage& storage, Atomic& produced) { assert(std::current_exception()); new (&storage) std::exception_ptr(std::current_exception()); produced->store(true, std::memory_order_release); } void concurrentExceptionPropagationStress( int numThreads, std::chrono::milliseconds milliseconds) { auto&& stop = std::atomic<bool>{false}; auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{}; auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumers = std::vector<std::thread>{}; for (auto i = 0; i < numThreads; ++i) { produced.emplace_back(new std::atomic<bool>{false}); consumed.emplace_back(new std::atomic<bool>{false}); exceptions.push_back({}); } auto producer = std::thread{[&]() { auto counter = std::vector<int>(numThreads, 0); for (auto i = 0; true; i = ((i + 1) % numThreads)) { try { throw ExceptionWithConstructionTrack{counter.at(i)++}; } catch (...) { transferCurrentException(exceptions.at(i), produced.at(i)); } while (!consumed.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } consumed.at(i)->store(false, std::memory_order_release); } }}; for (auto i = 0; i < numThreads; ++i) { consumers.emplace_back([&, i]() { auto counter = 0; while (true) { while (!produced.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } produced.at(i)->store(false, std::memory_order_release); try { auto storage = &exceptions.at(i); auto exc = folly::launder( reinterpret_cast<std::exception_ptr*>(storage)); auto copy = std::move(*exc); exc->std::exception_ptr::~exception_ptr(); std::rethrow_exception(std::move(copy)); } catch (std::exception& exc) { auto value = std::stoi(exc.what()); EXPECT_EQ(value, counter++); } consumed.at(i)->store(true, std::memory_order_release); } }); } std::this_thread::sleep_for(milliseconds); stop.store(true); producer.join(); for (auto& thread : consumers) { thread.join(); } } } // namespace ``` Pull Request resolved: #5684 Differential Revision: D16746077 Pulled By: miasantreble fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d

…_and_filter_blocks is false (#5705) Summary: PR #5298 (and subsequent related patches) unintentionally changed the semantics of cache_index_and_filter_blocks: historically, this option only affected the main index/filter block; with the changes, it affects index/filter partitions as well. This can cause performance issues when cache_index_and_filter_blocks is false since in this case, partitions are neither cached nor preloaded (i.e. they are loaded on demand upon each access). The patch reverts to the earlier behavior, that is, partitions are cached similarly to data blocks regardless of the value of the above option. Pull Request resolved: #5705 Test Plan: make check ./db_bench -benchmarks=fillrandom --statistics --stats_interval_seconds=1 --duration=30 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false ./db_bench -benchmarks=readrandom --use_existing_db --statistics --stats_interval_seconds=1 --duration=10 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false --cache_size=8000000000 Relevant statistics from the readrandom benchmark with the old code: rocksdb.block.cache.index.miss COUNT : 0 rocksdb.block.cache.index.hit COUNT : 0 rocksdb.block.cache.index.add COUNT : 0 rocksdb.block.cache.index.bytes.insert COUNT : 0 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 0 rocksdb.block.cache.filter.hit COUNT : 0 rocksdb.block.cache.filter.add COUNT : 0 rocksdb.block.cache.filter.bytes.insert COUNT : 0 rocksdb.block.cache.filter.bytes.evict COUNT : 0 With the new code: rocksdb.block.cache.index.miss COUNT : 2500 rocksdb.block.cache.index.hit COUNT : 42696 rocksdb.block.cache.index.add COUNT : 2500 rocksdb.block.cache.index.bytes.insert COUNT : 4050048 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 2500 rocksdb.block.cache.filter.hit COUNT : 4550493 rocksdb.block.cache.filter.add COUNT : 2500 rocksdb.block.cache.filter.bytes.insert COUNT : 10331040 rocksdb.block.cache.filter.bytes.evict COUNT : 0 Differential Revision: D16817382 Pulled By: ltamasi fbshipit-source-id: 28a516b0da1f041a03313e0b70b28cf5cf205d00

Summary: Previously, the end key of a range deletion tombstone was considered exclusive for the purposes of deletion, but considered inclusive when checking if two SSTables overlap. For example, an SSTable with a range deletion tombstone [a, b) would be considered overlapping with an SSTable with a range deletion tombstone [b, c). This commit fixes this check. Pull Request resolved: #5649 Differential Revision: D16808765 Pulled By: anand1976 fbshipit-source-id: 5c7ad1c027e4f778d35070e5dae1b8e6037e0d68

Summary: Introducing write_unprepared feature. Pull Request resolved: #5711 Differential Revision: D16838307 Pulled By: maysamyabandeh fbshipit-source-id: d9a4daf63dd0f855bea49c14ce84e6299f1401c7

Summary: Add a command in ldb so that users can print out tombstones in SST files. In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code. Pull Request resolved: #5615 Test Plan: Add a new unit test Differential Revision: D16550326 fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e

Summary: Pull Request resolved: #5710 Test Plan: HISTORY.md-only change, no testing required. Differential Revision: D16836869 Pulled By: ltamasi fbshipit-source-id: 978148f1d14b0c46839a94d7ada8a5e8ecf73965

Summary: there is no need to return void*, as std::thread::thread(Func&& f, Args&&... args ) only requires `Func` to be callable. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: #5709 Differential Revision: D16832894 fbshipit-source-id: a1e1b876fa8d55589ef5feb5b27f3a435068b747

Summary: add "linux/falloc.h" in env/io_posix.cc to fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared Signed-off-by: sheng qiu <herbert1984106@gmail.com> Pull Request resolved: #5708 Differential Revision: D16832922 fbshipit-source-id: 30e787c4a1b5a9724a8acfd68962ff5ec5f27d3e

Summary: VersionSet::ApproximateSize doesn't need to create two separate index iterators and do binary search for each in BlockBasedTable. So BlockBasedTable::ApproximateSize was added that creates the iterator once and uses it to calculate the data size between start and end keys. Pull Request resolved: #5693 Differential Revision: D16774056 Pulled By: elipoz fbshipit-source-id: 53ce262e1a057788243bf30cd9b8aa6581df1a18

Summary: This was previously broken, as the performance context-related macro signatures in file monitoring/perf_context_imp.h deviated for the case when NPERF_CONTEXT was defined and when it was not. Update the macros for the `-DNPERF_CONTEXT` case, so it compiles. Pull Request resolved: #5704 Differential Revision: D16867746 fbshipit-source-id: 05539724cb1f7955ecc42828365836a677759ad9

Summary: Update HISTORY.md by removing a feature from "Unreleased" to 6.4.0 after cherry-picking related commits to 6.4.fb branch. Pull Request resolved: #5714 Differential Revision: D16865334 Pulled By: riversand963 fbshipit-source-id: f17ede905a1dfbbcdf98806ca398c618cf54748a

Summary: In valgrind_test, TransactionTest.GetWithoutSnapshot ran 2 hours and still didn't finish. Black list from valgrind_test to prevent timeout. Pull Request resolved: #5715 Test Plan: run "make valgrind_test" and see whether the test is still generated. Differential Revision: D16866009 fbshipit-source-id: 92c78049b0bc1c2b9a0dfc1b7c8a9206b36f02f0

Summary: fix the regression introduced by cc9fa7f Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: #5687 Differential Revision: D16870212 fbshipit-source-id: 78b5519e1d2b03262d102ca530491254ddffdc38

Summary: Pull Request resolved: #5674 Differential Revision: D16870338 fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7

…#5712) Summary: Previous PR #3601 added support for making prefix_extractor dynamically mutable. However, there was a missing check for hash index when creating new BlockBasedTableIterator. While the check may be redundant because no other types of IndexReader makes uses of the flag, it is less error-prone to add the missing check so that future index reader implementation will not worry about violating the contract. Pull Request resolved: #5712 Differential Revision: D16842052 Pulled By: miasantreble fbshipit-source-id: aef11c0ff7a690ed248f5b8fe23481cac486b381

Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: #5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413

Summary: Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout. If WAL is disabled, make the odds 5x likely to trigger. Pull Request resolved: #5717 Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out. Differential Revision: D16897237 fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c

Summary: We see this TSAN warning: WARNING: ThreadSanitizer: data race (pid=282806) Write of size 8 at 0x7b6c00000e38 by thread T16 (mutexes: write M1023578822185846136): #0 operator delete(void*) <null> (libtsan.so.0+0x0000000795f8) #1 rocksdb::DBImpl::BackgroundFlush(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::FlushReason*, rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2202 (db_flush_test+0x00000060b462) #2 rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2226 (db_flush_test+0x00000060cbd8) #3 rocksdb::DBImpl::BGWorkFlush(void*) db/db_impl/db_impl_compaction_flush.cc:2073 (db_flush_test+0x00000060d5ac) ...... Previous atomic write of size 4 at 0x7b6c00000e38 by main thread: #0 __tsan_atomic32_fetch_sub <null> (libtsan.so.0+0x00000006d721) #1 std::__atomic_base<int>::fetch_sub(int, std::memory_order) /mnt/gvfs/third-party2/libgcc/c67031f0f739ac61575a061518d6ef5038f99f90/7.x/platform007/5620abc/include/c++/7.3.0/bits/atomic_base.h:524 (db_flush_test+0x0000005f9e38) #2 rocksdb::ColumnFamilyData::Unref() db/column_family.h:286 (db_flush_test+0x0000005f9e38) #3 rocksdb::DBImpl::FlushMemTable(rocksdb::ColumnFamilyData*, rocksdb::FlushOptions const&, rocksdb::FlushReason, bool) db/db_impl/db_impl_compaction_flush.cc:1624 (db_flush_test+0x0000005f9e38) #4 rocksdb::DBImpl::TEST_FlushMemTable(rocksdb::ColumnFamilyData*, rocksdb::FlushOptions const&) db/db_impl/db_impl_debug.cc:127 (db_flush_test+0x00000061ace9) #5 rocksdb::DBFlushTest_CFDropRaceWithWaitForFlushMemTables_Test::TestBody() db/db_flush_test.cc:320 (db_flush_test+0x0000004b44e5) #6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.7.0/fused-src/gtest/gtest-all.cc:3824 (db_flush_test+0x000000be2988) ...... It's still very clear the cause of the warning is because that TSAN treats results from relaxed atomic::fetch_sub() as non-atomic with the operation itself. We can make it more explicit by bumping up the order to CS. Pull Request resolved: #5723 Test Plan: Run all existing test. Differential Revision: D16908250 fbshipit-source-id: bf17d39ed19058372bdf97f6440a743f88153021

… results (#5983) Summary: Right now, in db_stress's iterator tests, we always use the same CF to validate iterator results. This commit changes it so that a randomized CF is used in Cf consistency test, where every CF should have exactly the same data. This would help catch more bugs. Pull Request resolved: #5983 Test Plan: Run "make crash_test_with_atomic_flush". Differential Revision: D18217643 fbshipit-source-id: 3ac998852a0378bb59790b20c5f236f6a5d681fe

Summary: In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush(). This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen. Pull Request resolved: #5716 Test Plan: Run all tests with ASAN and TSAN. Differential Revision: D18217658 fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3

Summary: More release branches are created. We should include them in continuous format compatibility checks. Pull Request resolved: #5985 Test Plan: Let's see whether it is passes. Differential Revision: D18226532 fbshipit-source-id: 75d8cad5b03ccea4ce16f00cea1f8b7893b0c0c8

Summary: Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change. Pull Request resolved: #5986 Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up. Differential Revision: D18228278 fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553

Summary: Compaction iterator has many assert statements that are active only during test runs. Some rare bugs would show up only at runtime could violate the assert condition but go unnoticed since assert statements are not compiled in release mode. Turning the assert statements to runtime check sone pors and cons: Pros: - A bug that would result into incorrect data would be detected early before the incorrect data is written to the disk. Cons: - Runtime overhead: which should be negligible since compaction cpu is the minority in the overall cpu usage - The assert statements might already being violated at runtime, and turning them to runtime failure might result into reliability issues. The patch takes a conservative step in this direction by logging the assert violations at runtime. If we see any violation reported in logs, we investigate. Otherwise, we can go ahead turning them to runtime error. Pull Request resolved: #5935 Differential Revision: D18229697 Pulled By: maysamyabandeh fbshipit-source-id: f1890eca80ccd7cca29737f1825badb9aa8038a8

Summary: Right now, by default FIFO compaction has no TTL. We believe that a default TTL of 30 days will be better. With this patch, the default will be changed to 30 days. Default of Options.periodic_compaction_seconds will mean the same as options.ttl. If Options.ttl and Options.periodic_compaction_seconds left default, a default 30 days TTL will be used. If both options are set, the stricter value of the two will be used. Pull Request resolved: #5987 Test Plan: Add an option sanitize test to cover the case. Differential Revision: D18237935 fbshipit-source-id: a6dcea1f36c3849e13c0a69e413d73ad8eab58c9

Summary: Previously, periodic compaction is not supported in universal compaction. Add the support using following approach: if any file is marked as qualified for periodid compaction, trigger a full compaction. If a full compaction is prevented by files being compacted, try to compact the higher levels than files currently being compacted. If in this way we can only compact the last sorted run and none of the file to be compacted qualifies for periodic compaction, skip the compact. This is to prevent the same single level compaction from being executed again and again. Pull Request resolved: #5970 Test Plan: Add several test cases. Differential Revision: D18147097 fbshipit-source-id: 8ecc308154d9aca96fb192c51fbceba3947550c1

Summary: We have updated earlier release branches going back to 5.5 so they are built using gcc7 by default. Disabling ancient versions before that until we figure out a plan for them. Pull Request resolved: #5990 Test Plan: Ran the script locally. Differential Revision: D18252386 Pulled By: ltamasi fbshipit-source-id: a7bbb30dc52ff2eaaf31a29ecc79f7cf4e2834dc

Summary: This change sets up for alternate implementations underlying BloomFilterPolicy: * Refactor BloomFilterPolicy and expose in internal .h file so that it's easy to iterate over / select implementations for testing, regardless of what the best public interface will look like. Most notably updated db_bloom_filter_test to use this. * Hide FullFilterBitsBuilder from unit tests (alternate derived classes planned); expose the part important for testing (CalculateSpace), as abstract class BuiltinFilterBitsBuilder. (Also cleaned up internally exposed interface to CalculateSpace.) * Rename BloomTest -> BlockBasedBloomTest for clarity (despite ongoing confusion between block-based table and block-based filter) * Assert that block-based filter construction interface is only used on BloomFilterPolicy appropriately constructed. (A couple of tests updated to add ", true".) Pull Request resolved: #5967 Test Plan: make check Differential Revision: D18138704 Pulled By: pdillinger fbshipit-source-id: 55ef9273423b0696309e251f50b8c1b5e9ec7597

Summary: For upcoming new SST filter implementations, we will use a new 64-bit hash function (XXH3 preview, slightly modified). This change updates hash.{h,cc} for that change, adds unit tests, and out-of-lines the implementations to keep hash.h as clean/small as possible. In developing the unit tests, I discovered that the XXH3 preview always returns zero for the empty string. Zero is problematic for some algorithms (including an upcoming SST filter implementation) if it occurs more often than at the "natural" rate, so it should not be returned from trivial values using trivial seeds. I modified our fork of XXH3 to return a modest hash of the seed for the empty string. With hash function details out-of-lines in hash.h, it makes sense to enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p are inlined. To fix array-bounds warnings on some inline calls, I injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.) Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook checks are unhappy about #include on xxhash.cc file. I would fix that by rename to xxhash_cc.h, but to best preserve history I want to do that in a separate commit (PR) from the uintptr casts. Also updated filter_bench for this change, improving the performance predictability of dry run hashing and adding support for 64-bit hash (for upcoming new SST filter implementations, minor dead code in the tool for now). Pull Request resolved: #5984 Differential Revision: D18246567 Pulled By: pdillinger fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8

Summary: A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled. Pull Request resolved: #5993 Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2 Differential Revision: D18263223 fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814

Summary: FlushJobInfo and CompactionJobInfo are aggregates; we should use the aggregate initialization syntax to ensure members (specifically those of built-in types) are value-initialized. Pull Request resolved: #5997 Test Plan: make check Differential Revision: D18273398 Pulled By: ltamasi fbshipit-source-id: 35b1a63ad9ca01605d288329858af72fffd7f392

)" (#5999) Summary: This reverts commit 351e254. All branches have been fixed to buildable on FB environments, so we can revert it. Pull Request resolved: #5999 Differential Revision: D18281947 fbshipit-source-id: 6deaaf1b5df2349eee5d6ed9b91208cd7e23ec8e

Summary: We recently added periodic compaction to universal compaction. An old assertion that we can't onlyl compact the last sorted run triggered. However, with periodic compaction, it is possible that we only compact the last sorted run, so the assertion now became stricter than needed. Relaxing this assertion. Pull Request resolved: #6000 Test Plan: This should be a low risk change. Will observe whether stress test will pass after it. Differential Revision: D18285396 fbshipit-source-id: 9a6863debdf104c40a7f6c46ab62d84cdf5d8592

Summary: MaxCatchupWithNewSnapshot tests that the snapshot sequence number will be larger than the max sequence number when the snapshot was taken. However since the test does not have access to the max sequence number when the snapshot was taken, it uses max sequence number after that, which could have advanced the snapshot by then, thus making the test flaky. The fix is to compare with max sequence number before the snapshot was taken, which is a lower bound for the value when the snapshot was taken. Pull Request resolved: #5850 Test Plan: ~/gtest-parallel/gtest-parallel --repeat=12800 ./write_prepared_transaction_test --gtest_filter="*MaxCatchupWithNewSnapshot*" Differential Revision: D17608926 Pulled By: maysamyabandeh fbshipit-source-id: b122ae5a27f982b290bd60da852e28d3c5eb0136

) Summary: For MDEV-19670: MyRocks: key lookups into deleted data are very slow BaseDeltaIterator remembers iterate_upper_bound and will not let delta_iterator_ walk above the iterate_upper_bound if base_iterator_ is not valid anymore. == Rationale == The most straightforward way would be to make the delta_iterator (which is a rocksdb::WBWIIterator) to support iterator bounds. But checking for bounds has an extra CPU overhead. So we put the check into BaseDeltaIterator, and only make it when base_iterator_ is not valid. (note: We could take it even further, and move the check a few lines down, and only check iterator bounds ourselves if base_iterator_ is not valid AND delta_iterator_ hit a tombstone). Pull Request resolved: #5403 Differential Revision: D15863092 Pulled By: maysamyabandeh fbshipit-source-id: 8da458e7b9af95ff49356666f69664b4a6ccf49b

Summary: According to https://github.com/facebook/rocksdb/wiki/Rocksdb-BlockBasedTable-Format, the block read by BlockBasedTable::ReadMetaBlock is actually the meta index block. Therefore, it is better to rename the function to ReadMetaIndexBlock. This PR also applies some format change to existing code. Pull Request resolved: #6009 Test Plan: make check Differential Revision: D18333238 Pulled By: riversand963 fbshipit-source-id: 2c4340a29b3edba53d19c132cbfd04caf6242aed

Summary: Pull Request resolved: #6008 Differential Revision: D18343273 Pulled By: pdillinger fbshipit-source-id: f7d1c78d711bbfb0deea9ec88212c19ab2ec91b8

Summary: DBImpl extends the public GetSnapshot() with GetSnapshotForWriteConflictBoundary() method that takes snapshots specially for write-write conflict checking. Compaction treats such snapshots differently to avoid GCing a value written after that, so that the write conflict remains visible even after the compaction. The patch extends stress tests with such snapshots. Pull Request resolved: #5897 Differential Revision: D17937476 Pulled By: maysamyabandeh fbshipit-source-id: bd8b0c578827990302194f63ae0181e15752951d

Summary: In the previous PR #4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = a*exp(b*x) + c*exp(d*x)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters. For example: `./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48` Pull Request resolved: #5953 Test Plan: run db_bench with different parameters and checked the results. Differential Revision: D18053527 Pulled By: zhichao-cao fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7

Summary: Right now, in db_stress's CF consistency test's TestGet case, if failure happens, we do normal string printing, rather than hex printing, so that some text is not printed out, which makes debugging harder. Fix it by printing hex instead. Pull Request resolved: #5989 Test Plan: Build db_stress and see t passes. Differential Revision: D18363552 fbshipit-source-id: 09d1b8f6fbff37441cbe7e63a1aef27551226cec

Summary: TEST_GROUP=1 has sometimes been timing out but generally taking 45-50 minutes vs. 20-25 for groups 2-4. Beyond the compilation time, tests in group 1 consist of about 19 minutes of db_test, and 7 minutes of everything else. This change moves most of that "everything else" to group 2. Pull Request resolved: #6010 Test Plan: Travis for this PR, oncall watch Travis Differential Revision: D18373536 Pulled By: pdillinger fbshipit-source-id: 0b3af004c71e4fd6bc01a94dac34cc3079fc9ce1

…ter is used. (#5994) Summary: Recently, periodic compaction got turned on by default for leveled compaction is compaction filter is used. Since periodic compaction is now supported in universal compaction too, we do the same default for universal now. Pull Request resolved: #5994 Test Plan: Add a new unit test. Differential Revision: D18363744 fbshipit-source-id: 5093288ce990ee3cab0e44ffd92d8489fbcd6a48

…6002) Summary: It's useful to add test coverage for universal compaction's periodic compaction. Add two tests. Pull Request resolved: #6002 Test Plan: Run the two tests Differential Revision: D18363544 fbshipit-source-id: bbd04b54057315f64f959709006412db1f76d170

…5869) Summary: In stress test, all iterator verification is turned off is lower bound is enabled. This might be stricter than needed. This PR relaxes the condition and include the case where lower bound is lower than both of seek key and upper bound. It seems to work mostly fine when I run crash test locally. Pull Request resolved: #5869 Test Plan: Run crash_test Differential Revision: D18363578 fbshipit-source-id: 23d57e11ea507949b8100f4190ddfbe8db052d5a

Summary: This PR fixes #5975. In ```BlockBasedTable::RetrieveMultipleBlocks()```, we were calling ```MaybeReadBlocksAndLoadToCache()```, which is a no-op if neither uncompressed nor compressed block cache are configured. Pull Request resolved: #5991 Test Plan: 1. Add unit tests that fail with the old code and pass with the new 2. make check and asan_check Cc spetrunia Differential Revision: D18272744 Pulled By: anand1976 fbshipit-source-id: e62fa6090d1a6adf84fcd51dfd6859b03c6aebfe

Summary: From bzip2's official [download page](http://www.bzip.org/downloads.html), we could download it from sourceforge. This source would be more credible than previous web archive. Pull Request resolved: #5995 Differential Revision: D18377662 fbshipit-source-id: e8353f83d5d6ea6067f78208b7bfb7f0d5b49c05

…etaData (#6011) Summary: The patch exposes the file numbers of the SSTs as well as the oldest blob files they contain a reference to through the GetColumnFamilyMetaData/ GetLiveFilesMetaData interface. Pull Request resolved: #6011 Test Plan: Fixed and extended the existing unit tests. (The earlier ColumnFamilyMetaDataTest wasn't really testing anything because the generated memtables were never flushed, so the metadata structure was essentially empty.) Differential Revision: D18361697 Pulled By: ltamasi fbshipit-source-id: d5ed1d94ac70858b84393c48711441ddfe1251e9

Summary: The test would fire two flushes to let them run in parallel. Previously it wait for the first job to be scheduled before firing the second. It is possible the job is not started before the second job being scheduled, making the two job combine into one. Change to wait for the first job being started. Fixes #6017 Pull Request resolved: #6018 Test Plan: ``` while ./db_flush_test --gtest_filter=*FireOnFlushCompletedAfterCommittedResult*; do :; done ``` and let it run for a while. Signed-off-by: Yi Wu <yiwu@pingcap.com> Differential Revision: D18405576 Pulled By: riversand963 fbshipit-source-id: 6ebb6262e033d5dc2ef81cb3eb410b314f2de4c9

HaoyuHuang and others added 30 commits August 9, 2019 13:13

Blog post for write_unprepared (#5711)

6ec2bf3

Summary: Introducing write_unprepared feature. Pull Request resolved: #5711 Differential Revision: D16838307 Pulled By: maysamyabandeh fbshipit-source-id: d9a4daf63dd0f855bea49c14ce84e6299f1401c7

Update HISTORY.md for 6.3.2/6.4.0 and add a not-yet-released change

3a3dc29

Summary: Pull Request resolved: #5710 Test Plan: HISTORY.md-only change, no testing required. Differential Revision: D16836869 Pulled By: ltamasi fbshipit-source-id: 978148f1d14b0c46839a94d7ada8a5e8ecf73965

cmake: s/SNAPPY_LIBRARIES/snappy_LIBRARIES/ (#5687)

35fe685

Summary: fix the regression introduced by cc9fa7f Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: #5687 Differential Revision: D16870212 fbshipit-source-id: 78b5519e1d2b03262d102ca530491254ddffdc38

Fixes for building RocksJava releases on arm64v8

f2bf0b2

Summary: Pull Request resolved: #5674 Differential Revision: D16870338 fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7

siying and others added 29 commits October 29, 2019 18:16

Add clarifying/instructive header to TARGETS and defs.bzl

834feaf

Summary: Pull Request resolved: #6008 Differential Revision: D18343273 Pulled By: pdillinger fbshipit-source-id: f7d1c78d711bbfb0deea9ec88212c19ab2ec91b8

hliu18 merged commit bfdde1e into hliu18:master Nov 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sync from facebook master #4

sync from facebook master #4

hliu18 commented Nov 8, 2019

sync from facebook master #4

sync from facebook master #4

Conversation

hliu18 commented Nov 8, 2019