-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Index scan may return incomplete results in case of dynamic tablet split #12648
Labels
Comments
d-uspenskiy
added
kind/bug
This issue is a bug
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
May 25, 2022
d-uspenskiy
added a commit
that referenced
this issue
May 26, 2022
…mic table split Summary: In context of the change D16751 / 4e11189 redundant sorting of ybctids (before calling of the `PgDocOp::PopulateDmlByYbctidOps` method) was removed. But this change highlighted the problem that in case ybctids is not sorted read result might be incomplete in case of dynamic table split. The issue #12648 is created to solve the problem and create unit tests. Current change restores code with ybctids sorting as a workaround to unblock TPC-C benchmark. Test Plan: Jenkins Unit test will be added withing the context of fix for #12648 Reviewers: lnguyen Reviewed By: lnguyen Subscribers: smishra, yql Differential Revision: https://phabricator.dev.yugabyte.com/D17142
lnguyen-yugabyte
pushed a commit
that referenced
this issue
May 28, 2022
…in case of dynamic table split Summary: Original commit: D17142 / 17f7b80 In context of the change D16751 / 4e11189 redundant sorting of ybctids (before calling of the `PgDocOp::PopulateDmlByYbctidOps` method) was removed. But this change highlighted the problem that in case ybctids is not sorted read result might be incomplete in case of dynamic table split. The issue #12648 is created to solve the problem and create unit tests. Current change restores code with ybctids sorting as a workaround to unblock TPC-C benchmark. Test Plan: Jenkins Unit test will be added withing the context of fix for #12648 Reviewers: dmitry, hbhanawat, rthallam Reviewed By: rthallam Subscribers: yql, smishra Differential Revision: https://phabricator.dev.yugabyte.com/D17251
d-uspenskiy
added a commit
that referenced
this issue
Jun 28, 2022
Summary: Nowadays YSQL prepares read request and call the `PgClientService::Perform` RPC method of a local t-server. Then `PgClientService` determines target tablet and its leader for each request and make RPC calls to remote t-server. Between request preparation on the YSQL side and sending RPC by the `PgClientService` service table the splitting may occur. This may leads to incomplete result in case YSQL side uses wrong value for `ybctid_column_value` field which is used to determine target tablet. Example: - Suppose table `T` has only one tablet - YSQL prepares request to read batch of ybctids (`ybctid_1, ybctid_2, ybctid_3, ybctid_4`) from the table `T` in case of index scan of in case of foreign key constraint check - Index scan returns ybctids in non-sorted manner so order could be `[ybctid_3, ybctid_2, ybctid_1, ybctid_4]` - In the `PgDocReadOp::DoPopulateDmlByYbctidOps` method YSQL assigns the value of first item in the list to the `ybctid_column_value` field of read request (`ybctid_3` in our case) - After the `PgClientService::Perform` method is called by the YSQL layer table `T` get split onto two tablets: the first one contains ybctids from `(-INF, ybctid_3)` and the second one contains values `[ybctid_3, INF)` - `PgClientService::Perform` method resolves the target tablet for the request by using the `ybctid_column_value` field which is equal to `ybctid_3`. - `PgClientService` sends request to the second tablet of the `T` table. As a result only values for `ybctid_3`, `ybctid_4` will be returned. Solution is to use lowest ybctid value instead of first one. **Note:** In context of the diff additional change is made in the `InitRangePartitionKey` function. Before the change this function always used the `ybctid_column_value` value as the partition key even in case `paging_state` is specified. This was leading to the wrong behavior in case of dynamic table splitting. The `paging_state` field must be preferred over the `ybctid_column_value` (same as in the `InitHashPartitionKey` function) Test Plan: Jenkins New unit tests are introduced ``` ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsIndexScan ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsFKConstraint ``` Reviewers: timur, rskannan, neil, arybochkin, alex, jason, dsrinivasan, pjain Reviewed By: pjain Subscribers: rthallam, arybochkin, yql Differential Revision: https://phabricator.dev.yugabyte.com/D17341
d-uspenskiy
added a commit
that referenced
this issue
Jul 8, 2022
Summary: Small optimization for avoid memory allocations in case of index scan/FK checks. This change simplifies the solution which was introduced previously in the https://phabricator.dev.yugabyte.com/D17341 diff. Simplifications is achieved by removing redundant vector `lowest_ybctid`. Also diff reduced number of copying ybctid into arena by using `ref_binary_value` instead of `dup_binary_value` when possible. Test Plan: Jenkins Reviewers: neil, mihnea, alex, rskannan Reviewed By: rskannan Subscribers: tnayak, yql Differential Revision: https://phabricator.dev.yugabyte.com/D18047
d-uspenskiy
added a commit
that referenced
this issue
Jul 11, 2022
… splitting Summary: Nowadays YSQL prepares read request and call the `PgClientService::Perform` RPC method of a local t-server. Then `PgClientService` determines target tablet and its leader for each request and make RPC calls to remote t-server. Between request preparation on the YSQL side and sending RPC by the `PgClientService` service table the splitting may occur. This may leads to incomplete result in case YSQL side uses wrong value for `ybctid_column_value` field which is used to determine target tablet. Example: - Suppose table `T` has only one tablet - YSQL prepares request to read batch of ybctids (`ybctid_1, ybctid_2, ybctid_3, ybctid_4`) from the table `T` in case of index scan of in case of foreign key constraint check - Index scan returns ybctids in non-sorted manner so order could be `[ybctid_3, ybctid_2, ybctid_1, ybctid_4]` - In the `PgDocReadOp::DoPopulateDmlByYbctidOps` method YSQL assigns the value of first item in the list to the `ybctid_column_value` field of read request (`ybctid_3` in our case) - After the `PgClientService::Perform` method is called by the YSQL layer table `T` get split onto two tablets: the first one contains ybctids from `(-INF, ybctid_3)` and the second one contains values `[ybctid_3, INF)` - `PgClientService::Perform` method resolves the target tablet for the request by using the `ybctid_column_value` field which is equal to `ybctid_3`. - `PgClientService` sends request to the second tablet of the `T` table. As a result only values for `ybctid_3`, `ybctid_4` will be returned. Solution is to use lowest ybctid value instead of first one. **Note:** In context of the diff additional change is made in the `InitRangePartitionKey` function. Before the change this function always used the `ybctid_column_value` value as the partition key even in case `paging_state` is specified. This was leading to the wrong behavior in case of dynamic table splitting. The `paging_state` field must be preferred over the `ybctid_column_value` (same as in the `InitHashPartitionKey` function) Original commits: 4c08fae / D17341 0979a46 / D18047 Test Plan: Jenkins: rebase: 2.14 New unit tests are introduced ``` ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsIndexScan ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsFKConstraint ``` Reviewers: timur, rskannan, neil, arybochkin, alex, jason, dsrinivasan, pjain Reviewed By: pjain Subscribers: yql, arybochkin, rthallam Differential Revision: https://phabricator.dev.yugabyte.com/D17994
d-uspenskiy
added a commit
that referenced
this issue
Aug 26, 2022
… splitting Summary: Nowadays YSQL prepares read request and call the `PgClientService::Perform` RPC method of a local t-server. Then `PgClientService` determines target tablet and its leader for each request and make RPC calls to remote t-server. Between request preparation on the YSQL side and sending RPC by the `PgClientService` service table the splitting may occur. This may leads to incomplete result in case YSQL side uses wrong value for `ybctid_column_value` field which is used to determine target tablet. Example: - Suppose table `T` has only one tablet - YSQL prepares request to read batch of ybctids (`ybctid_1, ybctid_2, ybctid_3, ybctid_4`) from the table `T` in case of index scan of in case of foreign key constraint check - Index scan returns ybctids in non-sorted manner so order could be `[ybctid_3, ybctid_2, ybctid_1, ybctid_4]` - In the `PgDocReadOp::DoPopulateDmlByYbctidOps` method YSQL assigns the value of first item in the list to the `ybctid_column_value` field of read request (`ybctid_3` in our case) - After the `PgClientService::Perform` method is called by the YSQL layer table `T` get split onto two tablets: the first one contains ybctids from `(-INF, ybctid_3)` and the second one contains values `[ybctid_3, INF)` - `PgClientService::Perform` method resolves the target tablet for the request by using the `ybctid_column_value` field which is equal to `ybctid_3`. - `PgClientService` sends request to the second tablet of the `T` table. As a result only values for `ybctid_3`, `ybctid_4` will be returned. Solution is to use lowest ybctid value instead of first one. **Note:** In context of the diff additional change is made in the `InitRangePartitionKey` function. Before the change this function always used the `ybctid_column_value` value as the partition key even in case `paging_state` is specified. This was leading to the wrong behavior in case of dynamic table splitting. The `paging_state` field must be preferred over the `ybctid_column_value` (same as in the `InitHashPartitionKey` function) Original commits: 4c08fae / D17341 0979a46 / D18047 Test Plan: Jenkins: rebase: 2.12 **Note:** Unit tests from origanal commits are not merged as additional changes requires to make them work. Diff is checked by other existing unittest. Reviewers: timur, neil, arybochkin, alex, jason, dsrinivasan, pjain, rskannan Reviewed By: rskannan Subscribers: rthallam, arybochkin, yql Differential Revision: https://phabricator.dev.yugabyte.com/D19071
ramsrivatsa
added a commit
that referenced
this issue
Aug 31, 2022
…o tablet splitting Summary: YSQL support for tablet splitting has been added from the following commit --> [[ cb8d9c5 | cb8d9c5 ]]. **Preamble: ** YSQL support for tablet splitting includes the lower bound and upper bound hash code as a part of the request protobuf. Hence, after tablet splits, docDB will send a paging state till the upper bound partition key is reached which avoids missing tablets due to tablet split. This feature is specifically useful in situations where requests are prepared in batches right before the tablets split. Under such circumstances, if we use a partition key of one of the candidates in the batch (specifically where the hash value of the candidate in the batch will be mapped to the 2nd tablet when the original tablet is split into two), we can end up scanning only one part of the tablet (the 2nd tablet) that is split while missing the other part of the tablet. In order to solve that scenario, we set the partition key to the lower_bound (lower bound will guarantee that we start from the 1st tablet whenever a tablet is split) and we scan till the docDB iterator reaches the upper bound. After YSQL support for tablet splitting was implemented, the following shortcomings had risen. 1. YSQL support for tablet splitting was not supported for Range Split Tables Although lower and upper bounds were respected in the original diff, for range split tables it utilized one of the lists of ybctids to set the partition key. In that situation, if that ybctid belonged to the 2nd tablet of the post-split tablet, docDB start scanning from the 2nd tablet. This will end up missing the first tablet. 2. [[ c9fcd93 | YSQL: Explicit row-level locking should only lock what is returned to the client ]] broke existing support for tablet splitting. This commit breaks setting lower and upper bounds and eventually, YSQL support for tablet splitting for hash partitioned tables. The following issue [[ #12648 | #12648 ]] fixes these shortcomings at the pggate layer. Although, [[ #12648 | #12648 ]] fixes those issues, but violates the standardization implementation for handling tablet splitting. Whenever we batch multiple keys into requests belonging to a single tablet, the standardization followed is to just set the lower and upper bounds for partitions in protobuf request. This allows for simpler (and cleaner) design and implementation. This diff implements the standardization by setting partition key, hash code, and max hash code for batched requests using lower and upper bounds at the client layer. Test Plan: Jenkins Original fix for the issue [[ #12648 | #12648 ]] introduces unit tests, which are applicable for the current fix and passes with the new code change. ``` ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsIndexScan ./yb_build.sh --gtest_filter PgMiniTest.AutoScanNextPartitionsFKConstraint ``` Reviewers: neil, timur, arybochkin, amartsinchyk, dmitry Reviewed By: dmitry Subscribers: pjain, zyu, yql Differential Revision: https://phabricator.dev.yugabyte.com/D19194
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Jira Link: DB-540
Description
Index scan and FK check internally calls the
PgDocOp::PopulateDmlByYbctidOps
method. This method prepares read request with batch of ybctids. The value of the first ybctid is used for routing to particular tablet and it is written into theybctid_column_value
field of read request.https://github.com/yugabyte/yugabyte-db/blob/master/src/yb/yql/pggate/pg_doc_op.cc#L597
Using of the first ybctid from unsorted list of ybctids might be a problem due to dynamic splitting.
Suppose particular table has single tablet and the following ybctids was returned by the index:
ybctid_4, ybctid_3, ybctid_1, ybctid_5, ybctid_6
(hash index doesn't guarantee rows order).YSQL prepares read request and set the value of
ybctid_column_value
field toybctid_4
(the first ybctid in the list).At this point DocDB initiate table split and table got split onto 2 tablets (-INF, ybctid_3), [ybctid_4, +INF).
Prepared read request will go to tablet 2 as a result the row for
ybctid_1
andybctid_3
will not be read as they are placed into tablet 1.Possible solution:
The value of minimal ybctid from the list must be set to
ybctid_column_value
and not the first one.The text was updated successfully, but these errors were encountered: