Skip to content
Shanshan Wang edited this page Dec 25, 2018 · 4 revisions

Pegasus Scanner

Scanner

Interface

void pegasus_client_impl::async_get_scanner(const std::string &hash_key,
                                            const std::string &start_sortkey,
                                            const std::string &stop_sortkey,
                                            const scan_options &options,
                                            async_get_scanner_callback_t &&callback);

int pegasus_client_impl::get_scanner(const std::string &hash_key,
                                     const std::string &start_sort_key,
                                     const std::string &stop_sort_key,
                                     const scan_options &options,
                                     pegasus_scanner *&scanner);

Note: async interface is based on sync

Unordered full scanners

Interface

void pegasus_client_impl::async_get_unordered_scanners(
    int max_split_count,
    const scan_options &options,
    async_get_unordered_scanners_callback_t &&callback);
    
int pegasus_client_impl::get_unordered_scanners(int max_split_count,
                                                const scan_options &options,
                                                std::vector<pegasus_scanner *> &scanners);

Note: sync interface is based on async

Unordered range scanners

Interface

void pegasus_client_impl::async_get_unordered_range_scanners(
    int max_split_count,
    const std::string &start_hashkey,
    const std::string &stop_hashkey,
    const std::string &start_sortkey,
    const std::string &stop_sortkey,
    const scan_options &options,
    async_get_unordered_scanners_callback_t &&callback);
    
int pegasus_client_impl::get_unordered_range_scanners(int max_split_count,
                                                      const std::string &start_hashkey,
                                                      const std::string &stop_hashkey,
                                                      const std::string &start_sortkey,
                                                      const std::string &stop_sortkey,
                                                      const scan_options &options,
                                                      std::vector<pegasus_scanner *> &scanners);

Note: sync interface is based on async

Sorted scanner

Interface

int pegasus_client_impl::get_sorted_scanner(const std::string &start_hashkey,
                                            const std::string &stop_hashkey,
                                            const std::string &start_sortkey,
                                            const std::string &stop_sortkey,
                                            const scan_options &options,
                                            pegasus_sorted_scanner *&scanner);

Note:

  • async interface is not implemented yet
  • based on get_unordered_scanners if all input keys are empty, otherwise based on get_range_unordered_scanners

Server

void pegasus_server_impl::on_get_scanner(const ::dsn::apps::get_scanner_request &request,
                                         ::dsn::rpc_replier<::dsn::apps::scan_response>  &reply);
    
void pegasus_server_impl::on_scan(const ::dsn::apps::scan_request &request,
                                  ::dsn::rpc_replier<::dsn::apps::scan_response> &reply);

Hash/screening range and reverse scan

struct get_scanner_request
{
    1:dsn.blob  start_key;
    2:dsn.blob  stop_key;
    3:bool      start_inclusive;
    4:bool      stop_inclusive;
    5:i32       batch_size;
    6:bool      no_value; // not return value, only return sortkeys
    7:filter_type  hash_key_filter_type;
    8:dsn.blob     hash_key_filter_pattern;
    9:filter_type  sort_key_filter_type;
    10:dsn.blob    sort_key_filter_pattern;
    11:bool     hash_sort_range; // if scan multiple ranges (index screening) >>> new, determine by client according to input hashkey and sortkey range
    12:bool     reverse; // if scan in reverse direction >>> new, input to client
}
Main idea of hash/screening range scan
  • code flow
    1. generate first scan range (start_hashkey-start_sortkey, start_hashkey-stop_sortkey)
    2. seek start_hashkey-start_sortkey
    3. iterate until to start_hashkey-stop_sortkey
    4. generate next scan range (next_hashkey-start_sortkey, next_hashkey-stop_sortkey)
    5. seek (next_hashkey-start_sortkey)
    6. iterate until to next_hashkey-stop_sortkey
    7. loop until to seek stop_hashkey-start_sortkey
    8. iterate until to stop_hashkey-stop_sortkey
  • handle empty and invalid input range
  • reset first_exclusive (from start_inclusive) for new sub range if screening
Main idea of reverse scan
  • reverse start and stop, both hashkey and sortkey, as well as inclusive flag
    • check empty range, seek, check stop
  • it->SeekForPrev, it->Prev
  • delay determine first sub range for screening if no ending key
  • handle empty entry for hash/screening range
bool pegasus_server_impl::generate_first_scan_range(const rocksdb::Slice start,
                                                    const rocksdb::Slice stop,
                                                    ::dsn::blob &range_start_key,
                                                    ::dsn::blob &range_stop_key,
                                                    const bool reverse = false);

void pegasus_server_impl::generate_next_scan_range(const rocksdb::Slice key,
                                                   ::dsn::blob &range_start_key,
                                                   ::dsn::blob &range_stop_key,
                                                   bool reverse = false);

// determine actual first range, only be called by reverse scan when stop key is empty
void pegasus_server_impl::generate_reverse_first_scan_range(const rocksdb::Slice key,
                                                            const rocksdb::Slice stop,
                                                            ::dsn::blob &range_start_key,
                                                            ::dsn::blob &range_stop_key);
// generate the adjacent next rocksdb key according to hash key and sort key.
// T may be std::string or ::dsn::blob.
// data is copied into 'next'.
template <typename T>
void pegasus_generate_next_blob(::dsn::blob &next, const T &hash_key, const T &sort_key);

// generate the adjacent next rocksdb key according to hash key.
// T may be std::string or ::dsn::blob.
// data is copied into 'next'.
template <typename T>
void pegasus_generate_next_blob(::dsn::blob &next, const T &hash_key, const bool reverse = false);

// generate the adjacent next rocksdb key according to hash key and sort key.
// T may be std::string or ::dsn::blob.
// data is copied into 'next'.
template <typename T>
void pegasus_generate_next_blob_by_hashkey(::dsn::blob &next,
                                           const T &hash_key,
                                           const T &sort_key,
                                           const bool reverse = false);
Invalid input range
  • [-start/stop_sortkey] - E2018-12-17 17:22:56.587 (1545038576587724382 325f) mimic.io-thrd.12895: invalid key range: hash key cannot be empty when sort key is not empty for range scan ERROR: get sorted scanner failed, error=hash key can't be empty
    • 两个原因:没有input hashkey客户端不方便生成start/stop blob传递给服务端;优化器一般不会为这种查询选择index screening
Inclusive behavior
  • Both start_inclusive and stop_inclusive flag will be reset to true no matter what input option is for unordered range scanners, as well as sorted scanner. Current behavior is that we seek [start_hashkey-start_sortkey] as the scan start position, and stop at [stop-hashkey-stop_sortkey].
    • 原因:执行器会处理边界
  • If inclusive false is honored, we need to seek [start_hashkey+1-start_sortkey+1] for start, and stop at [stop_hashkey-1-stop_sortkey-1], and reset the inclusive flag to true. Currently we only support inclusive true for both start and stop point.

Summary

scanner type range - empty start = from first, empty stop = to last sort reverse inclusive - false > true on same key
get_scanner sortkey range, no result without warning if input empty range (check by client) by hashkey-sortkey within input hashkey-sortkey range start/stop_inclusive is supported, default is true/false
get_unordered_scanners unsupported by hashkey_len-hashkey-sortkey within a replica within a replica start/stop_inclusive is ignored
get_unordered_range_scanners sortkey range, hashkey range and hashkey-sortkey range (screening), no result with warning if input empty range/sub range (check by client), error if input invalid range/sub range (check by client), same behavior as get_unordered_scanners if all input keys are empty by hashkey_len-hashkey-sortkey within a replica within a replica start/stop_inclusive is reset to true
get_sorted_scanner sortkey range, hashkey range and hashkey-sortkey range (screening), no result with warning if input empty range/sub range (check by client), error if input invalid range/sub range (check by client) by hashkey_len-hashkey-sortkey overall start/stop_inclusive is reset to true