You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For many simple queries, where the client wants to lookup multiple keys in one call to YugaByte, we should allow those sub-queries/key lookups to happen in parallel.
Consider an example like:
CREATE TABLE T (key text PRIMARY KEY, value text);
INSERT INTO T (key, value) VALUES ('k1', 'v1');
INSERT INTO T (key, value) VALUES ('k2', 'v2');
INSERT INTO T (key, value) VALUES ('k3', 'v3');
INSERT INTO T (key, value) VALUES ('k4', 'v3');
INSERT INTO T (key, value) VALUES ('k5', 'v3');
INSERT INTO T (key, value) VALUES ('k6', 'v3');
SELECT *
FROM T
WHERE key IN ('k1', 'k5');
Currently, for each key (k1 and k5) the lookup is done sequentially. But we should allow for these lookups across potentially different tablets (and potentially across different servers) to be done in parallel [Note: in some cases, such as if the IN list has many keys, and there is a LIMIT clause that's much smaller, we may not want to fire off all the lookups in parallel.]
The text was updated successfully, but these errors were encountered:
Isn't it better to do this on the client (the multi server part)?
Pretty sure many libraries have helper methods (python one does) to group keys by hash.
While querying parallell across tables still makes sense.
Or keeping both versions for the client one being the most efficient.
hi @ddorian - yes, if the grouping is done at client layer that's most optimal. Since it avoids extra hop for all keys in the common case. However, both are nice to have because if a client were to send a request having keys on multiple servers, the server should still do the right thing.
…onable
Summary:
For multi-partition selects, if all primary key columns are fixed with equality or IN conditions
we can estimate the maximum number of returned rows based on the number of options for each IN
condition.
Then, if the max number of returned rows is smaller than both the page size and limit, we optimize
to run all internal queries (to each partition) in parallel -- since we know we won't be doing extra
work and need to truncate the result after.
Test Plan: TestSelect.java, existing IN tests
Reviewers: pritam.damania, robert
Reviewed By: pritam.damania, robert
Subscribers: kannan, yql
Differential Revision: https://phabricator.dev.yugabyte.com/D5137
Allow multiple keys to be looked up in parallel.
For many simple queries, where the client wants to lookup multiple keys in one call to YugaByte, we should allow those sub-queries/key lookups to happen in parallel.
Consider an example like:
Currently, for each key (k1 and k5) the lookup is done sequentially. But we should allow for these lookups across potentially different tablets (and potentially across different servers) to be done in parallel [Note: in some cases, such as if the IN list has many keys, and there is a LIMIT clause that's much smaller, we may not want to fire off all the lookups in parallel.]
The text was updated successfully, but these errors were encountered: