Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support distinct pushdown in YSQL #17872

Closed
1 task done
vbalvida opened this issue Jun 21, 2023 · 2 comments
Closed
1 task done

[YSQL] Support distinct pushdown in YSQL #17872

vbalvida opened this issue Jun 21, 2023 · 2 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue

Comments

@vbalvida
Copy link
Contributor

vbalvida commented Jun 21, 2023

Jira Link: DB-6955

Description

Context

YB LSM indexes support skipping over duplicate values of the leading column(s). This results in an efficient execution of distinct queries since the scan needs only fetch distinct tuples from the index. The list of leading columns over which the index skips is called the prefix and it is uniquely identified by its length, hereby referred to as the prefix length. This access pattern is colloquially called loose index scan or skip scan.

While the storage layer supports such an access pattern using the Hybrid Scan module, YSQL support is still lacking. Currently, we compute the prefix and make the decision to pushdown distinct during execution. Such an approach is less than ideal and here we make a case for moving this task to the planner. See issues #14559 and #16257 for more information on how the pushdown is currently implemented.

Objective

Making the decision to pushdown distinct in the planner as opposed to PgGate lets us avoid several correctness issues. In summary, distinct should not be pushed down if the distinct operation does not commute/distribute with other operations in the query. This commutativity is better checked in the planner.

Aggregate and Windowing Functions

Aggregation and Windowing operations require that we do not skip over duplicate rows since aggregation happens over even duplicate tuples in the group. See #17648 for more information.

As an example, consider a query such as

SELECT DISTINCT COUNT(r1) FROM t WHERE r1 = 1;

where r1 is the leading range column of t. Fatal to skip over duplicate rows of r1 here.

Volatile targets or predicates

Similar reasoning applies to volatile targets and predicates. Volatile functions have side effects and hence need to be executed for each tuple. This property makes them inherently incompatible with distinct pushdown.

Example

SELECT DISTINCT r1 * RANDOM() FROM t WHERE r1 = 1;

Pushing down distinct here is a mistake since RANDOM() can take different values for different duplicate values of r1.


Other than such correctness issues, making the planner aware of distinct pushdown helps it generate and select better plans overall.

Performance

Performance is a primary reason why distinct index scans are so important. There are several shortcomings with our current approach that we wish to overcome.

Inaccurate plan selection

Imagine that we have a table t setup using the following commands

CREATE TABLE t (r1 INT, r2 INT, PRIMARY KEY(r1 ASC, r2 ASC));
INSERT INTO t (SELECT 1, i FROM GENERATE_SERIES(1, 10000) AS i);

Now, getting the distinct elements of the first column executes the following plan

EXPLAIN ANALYZE SELECT DISTINCT r1 FROM t;
                                                QUERY PLAN                                                 
-----------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=102.50..102.51 rows=1 width=4) (actual time=80.961..80.962 rows=1 loops=1)
   Group Key: r1
   ->  Seq Scan on t  (cost=0.00..100.00 rows=1000 width=4) (actual time=9.449..76.847 rows=10000 loops=1)
 Planning Time: 5.901 ms
 Execution Time: 81.103 ms
 Peak Memory Usage: 14 kB
(6 rows)

Forcing an index scan yields

EXPLAIN ANALYZE SELECT DISTINCT r1 FROM t WHERE r1 <= 10000;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Unique  (cost=0.00..15.50 rows=1 width=4) (actual time=1.711..1.717 rows=1 loops=1)
   ->  Index Scan using t_pkey on t  (cost=0.00..15.25 rows=100 width=4) (actual time=1.709..1.715 rows=1 loops=1)
         Index Cond: (r1 <= 10000)
 Planning Time: 0.128 ms
 Execution Time: 1.808 ms
 Peak Memory Usage: 8 kB
(6 rows)

As can be seen from the differences in the reported execution times, a wildly inaccurate plan is being picked. The primary reason is that the planner is unaware that fewer rows are being fetched in the distinct index scan. More importantly, the user is now clueless as to why the index scan is so much quicker. Classifying this particular scan as a distinct index scan helps is better debugging of performance issues. Moreover, the current approach also does some redundant computation, discussed in the next section.

Redundant Unique/HashAggregate Node

In the above plan, observe that the planner creates a Unique node on top of the index scan. In hash-partitioned tables, such a unique node is not necessary since the distinct index scan already returns distinct values of the prefix. In fact, vanilla postgres provides two methods to distinctify scan paths. Pg can either sort+unique the underlying path or use hash aggregation. Add a third mechanism to add distinct paths. If the distinct index scan already returns a distinctified path, use the path directly. This reduces redundant computation with sorting/aggregation. There are many other opportunities that not mentioned in this ticket.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@vbalvida vbalvida added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jun 21, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 21, 2023
@vbalvida vbalvida self-assigned this Jun 21, 2023
@vbalvida vbalvida added kind/new-feature This is a request for a completely new feature kind/enhancement This is an enhancement of an existing feature and removed status/awaiting-triage Issue awaiting triage kind/bug This issue is a bug kind/new-feature This is a request for a completely new feature labels Jun 21, 2023
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug kind/enhancement This is an enhancement of an existing feature priority/low Low priority and removed kind/enhancement This is an enhancement of an existing feature kind/bug This issue is a bug priority/medium Medium priority issue labels Jun 21, 2023
@jasonyb
Copy link
Contributor

jasonyb commented Jun 21, 2023

Can you fix the TBD/TODO?

What is "prefix"? "prefix computation", "prefix index scan"

@vbalvida vbalvida changed the title [YSQL] Add a new index path to scan distinct prefixes [YSQL] Add a new index path to scan distinct primary key prefix Jun 22, 2023
@yugabyte-ci yugabyte-ci added priority/medium Medium priority issue and removed priority/low Low priority labels Jun 23, 2023
@vbalvida vbalvida changed the title [YSQL] Add a new index path to scan distinct primary key prefix [YSQL] Add support to scan distinct prefixes of indexes Jun 27, 2023
@vbalvida vbalvida added kind/new-feature This is a request for a completely new feature and removed kind/enhancement This is an enhancement of an existing feature labels Aug 5, 2023
@vbalvida vbalvida changed the title [YSQL] Add support to scan distinct prefixes of indexes [YSQL] Support distinct pushdown in YSQL Aug 8, 2023
vbalvida added a commit that referenced this issue Aug 9, 2023
Summary:
#### Objective

While the LSM index itself supports  prefix based distinct skip scans, the planning layer lacks awareness of such support. We make an Index Scan variant called “Distinct Index Scan'' to express such scans. The primary objective of this change is to integrate Distinct Index Scans into YSQL.

#### Design

Implementing distinct pushdown in YSQL requires three broad changes. First, the planner must generate distinct index scan path nodes. Then, these path nodes must be integrated properly into the overall query plan. Last, a few minor changes are required in the execution layer.

##### Generating Distinct Index Paths

Distinct Index Paths are generated along with other index paths. This is generally useful only  when generating paths for a DISTINCT query, bar a few exceptions such as semi joins and UNION queries (unsupported at the moment). To avoid creating another scan path node, the index scan path data structure is reused for distinct index scans as well. However, there are a few fundamental differences. First, distinct index scans are parameterized by the prefix length on the index. Hence, the planner can distinguish a distinct index scan from a normal index scan by simply checking if the prefix length is non-zero. Additionally, recall that a distinct index scan fetches fewer rows from the storage and as a consequence, has a different path cost. Finally, the distinct operation cannot always be pushed down to the base relations because of which distinct index scans cannot always be generated alongside the regular index paths. These differences are discussed in more detail in the following sections.

###### Computing the prefix length

To compute prefix length, first, we determine the set of keys that need to be unique. This change operates only on index-only scan scenarios.  Moreover, this change does not apply in the presence of  unsupported expressions such as aggregates and volatile functions within the requested  column references. With our distinct key set we have just determined, use the shortest prefix that encompasses all these key columns. We look for the shortest possible prefix since using a longer one leads to a poorly performant scan due to fetching more rows than necessary. We can use distinct index scans for queries that request any subset of index columns, not just prefixes, albeit requiring further processing on top.

###### Range partitioned tables

There is a small caveat with the current implementation of distinct pushdown. Range partitioned tables can return duplicate tuples since the DISTINCTification only happens within a tablet. To combat this, we generate a unique node on top of the distinct index scan, as a workaround. Hash partitioned tables do not exhibit this quirk.

###### Path Cost

Accurate costing of distinct index path is not the primary focus of the change. Regardless, we adjust the cost so that it makes a bit more sense. We simply scale the cost down by a rough estimation of the number of duplicate values of the prefix.

##### Distinct Index Scan Integration

In this section, we discuss how distinct queries were previously supported in YSQL, next we understand how pathkeys help in generating these query plans, then we reason why pathkeys do not mesh well with distinct pushdown, and finally we see how distinct queries are supported now with skip scan integration.

###### Distinct query plan generation before distinct pushdown

There are two primary mechanisms to create paths for distinct queries. One is sort based and the other hash based. The hash based method uses a hash aggregation method to remove duplicate values, not too relevant to distinct pushdown. The sort based method, on the other hand, is similar to a skip scan. Here, the input is first sorted and the duplicate values are now easily removed since they are adjacent to each other. This is useful when we expect the input to be already sorted like in the output of index scans or merge sort joins. The planner identifies the correct ordering of such scans/joins by looking at their pathkeys. Pathkeys provide useful functionality such as optimizing away constants and equivalent columns. However, they have their limitations when it comes to pushing down distinct.

###### Using skip scans with distinct queries

With skip scans, there is now a third way to generate a plan for distinct queries. A distinct query can use skip scans to generate a candidate distinct path once they are proven to be sufficiently distinct. While pathkeys provide a good mechanism for ordering proofs, they are inadequate for distinct pushdown. For example, not all distinct queries request leading columns of an index. In such cases, using a skip scan is still useful since they probably fetch fewer rows. We can then use the above techniques to deduplicate the output of the skip scan. Moreover, a distinct query need not request leading columns of an index in the same order as the index. To elaborate, unlike ordering, distinct does not care about the order in which the target keys are specified. This distinction is also visible when it comes to joins. Here, the distinct operation distributes over the join operation but the sort operation does not. For all these reasons, we use a different mechanism to propagate distinctness information to parent pathnodes.

##### Changes during execution time

The changes here are minimal. The primary objective here is to propagate the prefix length computed at the planning time to the underlying storage layer. Moreover, DocDB should use a HybridScan whenever the query layer requests for a prefix based skip scan.

#### Future Work

We will extend support for more complex queries that may involve join trees or those using DISTINCT ON queries.

Jira: DB-6955

Test Plan: ./yb_build.sh --java-test TestPgRegressDistinctPushdown

Reviewers: smishra, mihnea, tnayak

Reviewed By: tnayak

Subscribers: jason, yql, ybase

Differential Revision: https://phorge.dev.yugabyte.com/D26566
@vbalvida
Copy link
Contributor Author

The basic framework has landed. Any extensions or future work will be done as separate tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/new-feature This is a request for a completely new feature priority/medium Medium priority issue
Projects
None yet
Development

No branches or pull requests

3 participants