Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support OR clauses as index predicates in LSM based IndexScan #18004

Open
1 task done
vbalvida opened this issue Jun 29, 2023 · 0 comments
Open
1 task done

[YSQL] Support OR clauses as index predicates in LSM based IndexScan #18004

vbalvida opened this issue Jun 29, 2023 · 0 comments
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/low Low priority

Comments

@vbalvida
Copy link
Contributor

vbalvida commented Jun 29, 2023

Jira Link: DB-7065

Description

Overview

IndexScan supports both discrete filters such as key IN (a, b, c), range filters such as key >= a, and a conjunction of such filters. However, LSM indexes currently do not support OR clauses.

Even though, the predicate key = a OR key = b OR key = c is equivalent to key IN (a, b, c), the former is not recognized as an index predicate, see #4634. More generally, many expressions can be rewritten as a disjunction of disjoint ranges each defined purely by conjunctive clauses. For example, the expression (key1 = a OR key1 = b) AND key2 > c can be rewritten as (key1 = a AND key2 > c) OR (key1 = b AND key2 > c). Rows matching key1 = a AND key2 > c are disjoint with rows matching key1 = b AND key2 > c when a != b and equivalent otherwise.

Current Support

Index Clause support in Postgres

Postgres currently has poor support for index clauses that span multiple columns.

/* Data structure for collecting qual clauses that match an index */
typedef struct
{
	bool		nonempty;		/* True if lists are not all empty */
	/* Lists of RestrictInfos, one per index column */
	List	   *indexclauses[INDEX_MAX_KEYS];
} IndexClauseSet;

In the current architecture, each clause must correspond to a unique column. Moreover, selectivity computation may become more involved when index supports multicolumn clauses. See the comment below taken from the postgres code

/*
 * match_clause_to_index
 *	  Test whether a qual clause can be used with an index.
 *
 *                                                ...
 *
 * Note: it's possible that a badly-defined index could have multiple matching
 * columns.  We always select the first match if so; this avoids scenarios
 * wherein we get an inflated idea of the index's selectivity by using the
 * same clause multiple times with different index columns.
 */

Another caveat is when considering scalar array ops on a non-leading column

if (indexcol > 0)
{
	if (skip_lower_saop)
	{
		/* Caller doesn't want to lose index ordering */
		*skip_lower_saop = true;
		continue;
	}
	found_lower_saop_clause = true;
}

Will have to investigate why we skip these clauses in the index with non-leading columns. It might give some insight into similar scenarios with OR clauses as well. The following excerpt from the comment on RestrictInfo is a potential clue

/*
 ...
 * When the referenced clause is an OR clause, we generate a modified copy
 * in which additional RestrictInfo nodes are inserted below the top-level
 * OR/AND structure.  This is a convenience for OR indexscan processing:
 * indexquals taken from either the top level or an OR subclause will have
 * associated RestrictInfo nodes.
 ...
 */

Why care for multi-column clauses?

Because the storage layer supports them, see #14068.

Consider a simple query such as

EXPLAIN
  SELECT * FROM t
  WHERE (r1,r2) in ((1,1),(2,2));

Observe that the plan

                              QUERY PLAN                               
-----------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..110.00 rows=1000 width=16)
   Remote Filter: (((r1 = 1) AND (r2 = 1)) OR ((r1 = 2) AND (r2 = 2)))
(2 rows)

is not indexed despite support from the storage layer. However, when we execute the same query on the latest version of Postgres, we get the following plan (also see #4634)

                                QUERY PLAN
---------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=8.33..12.35 rows=1 width=16)
   Recheck Cond: (((r1 = 1) AND (r2 = 1)) OR ((r1 = 2) AND (r2 = 2)))
   ->  BitmapOr  (cost=8.33..8.33 rows=1 width=0)
         ->  Bitmap Index Scan on t_pkey  (cost=0.00..4.16 rows=1 width=0)
               Index Cond: ((r1 = 1) AND (r2 = 1))
         ->  Bitmap Index Scan on t_pkey  (cost=0.00..4.16 rows=1 width=0)
               Index Cond: ((r1 = 2) AND (r2 = 2))
(7 rows)

A bit surprising considering that DocDB supports these queries even better using HybridScan (colloquially known as skip scan).

More importantly, supporting disjunctions that act on more than column require support for multi-column clauses. Above query is one such example. Consider another simple example query

EXPLAIN
  SELECT * FROM t
  WHERE r1 < 2 OR r2 > 2;

This issue is especially nasty when working with DISTINCT queries (same query but DISTINCT)

EXPLAIN
  SELECT DISTINCT * FROM t
  WHERE r1 < 2 OR r2 > 2;

The disjoint ranges r1 < 2, and r1 >= 2 AND r2 > 2 each may have very few unique entries and an explicit Sequential Scan can be several orders of magnitude slower. In this case, even though postgres can generate a bitmap scan for the query, using a skip scan is so much better when there are only a few distinct queries. Because, well, we can skip over a lot of duplicate entries and thus avoid scanning them entirely.

All this discussion on multi-column support raises a natural question on how postgres supports them in BitmapScan. Investigating this might lead us to a solution to this problem.

Why not rewrite the query to be a UNION ALL of disjoint ranges?

All the predicates discussed so far can be written as a disjoint union of ranges that can be handled well by the LSM index. However, this is not always the case. Consider a query such as

EXPLAIN
  SELECT * FROM t
  WHERE r1 = 2 AND (f(r2) = 0 OR f(r3) = 4);

Only the first clause is indexable while the second clause is not. Converting this clause to conjunctive normal form is counter-productive since the clause (r1 = 2 AND f(r2) = 0) OR (r1 = 2 AND f(r3) = 4). It is not clear how one would extract disjoint ranges satisfying the predicate.

However, when the top level operation is an AND, postgres can pick any supported clause to be served by the index and then further filter the returned rows. In the above example, say the index can serve rows matching r1 = 2, then the planner can tack on another filter f(r2) = 0 OR f(r3) = 4 to further filter the rows. This post index scan filtering is not possible on the arms of an OR predicate instead.

Query Layer LSM Index

DocDB's LSM-based index supports more operators than the B-Tree Index. Specifying these differences might require changes in yb_lsm.c. Need to adjust the cost model appropriately as well since index scans can be cheaper.

Scan Order

Consider a query such as

EXPLAIN
  SELECT * FROM t
  ORDER BY r1, r2 DESC;

produces a plan such as

                         QUERY PLAN
------------------------------------------------------------
 Sort  (cost=149.83..152.33 rows=1000 width=8)
   Sort Key: r1, r2 DESC
   ->  Seq Scan on t  (cost=0.00..100.00 rows=1000 width=8)

which is inefficient since we can do a backward scan on r2 for each distinct element of r1. This is true if if the table resides in one colocation. If the table is partitioned a slightly more involved algorithm is necessary to combine the elements from different tablet servers.

Parallel Query

Will have to investigate how relevant #17984 is, especially since we might be able to retrieve disjoint ranges in parallel.

Efficiency Caveat

What's the number of disjoint ranges produced using this method?

In the worst case, all the OR conditions are in the last level of the expression tree. This can lead to an exponential growth in the number of clauses in DNF. Memory usage only has a polynomial advantage and hence is also exponential in the worst case. The predicate sizes are small enough that it should not matter in practice.

Example: Consider a predicate such as $p0 \wedge (p1 \vee p2 \vee ...) = (p0 \wedge p1) \vee ...$ When p0 itself constitutes an OR, then the number of disjunction simply doubles by having a single more AND clause.

ScanSpec

TODO: Better Explanation

TBD. Unclear what is supported. Requires testing. The current logic that computes the intersection/union of ranges is quite suspicious.

Queries to test

SELECT * FROM t
  WHERE r1 = 2 OR r1 = 4;

The range is determined as [2, 4]. A bit more non-trivial query we may want to consider

SELECT r1, r2 FROM t
  WHERE (r1 > 5 AND r2 < 7) OR (r1 < 13 AND r2 > 3);

The above query must not return (1, 1).

The suspicion comes from the fact that the union and intersection is too simplistic and simply takes adjusts lower and upper bounds of each individual key column separately. TBD: Does the current implementation always return a superset of the required rows?

HybridScanChoices

HybridScanChoices supports predicates using an option list data structure. In combination with the notion of column groups, the module is able to support arbitrary DNF predicates as long as the atoms are supported index predicates (TBD: any exceptions?).

Key column in multiple groups

Example predicate

  • ((r1, r2) IN ((1,1), (1,2)) AND ((r2,r3) IN (2,2), (2,3))

HybridScanChoices shouldn't get confused what column group a particular column belongs to.

Example predicates

  • r1 = 17 OR r2 = 29
  • !(r1 != 17 AND r2 != 29)
  • (r1 >= 5 AND r2 < 7) OR (r1 < 13 AND r2 > 3)
  • (r1 >= 5 OR r2 < 7) AND (r1 < 13 OR r2 > 3)

Expected Gains

SELECT DISTINCT

After #17607, supporting more queries in ScanChoices implies that the distinct pushdown feature is more broadly applicable.

Related: #17741

Motivating Query

Consider the following query

EXPLAIN ANALYZE SELECT DISTINCT r1, r2 FROM tr123 WHERE r1 < 2 OR r2 > 2;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=110.00..112.00 rows=200 width=8) (actual time=1033.983..1033.985 rows=2 loops=1)
   Group Key: r1, r2
   ->  Seq Scan on tr123  (cost=0.00..105.00 rows=1000 width=8) (actual time=17.432..999.262 rows=66667 loops=1)
         Remote Filter: ((r1 < 2) OR (r2 > 2))
 Planning Time: 22.120 ms
 Execution Time: 1035.500 ms
 Peak Memory Usage: 78 kB
(7 rows)

This can be rewritten as

EXPLAIN ANALYZE
  SELECT DISTINCT r1, r2 FROM tr123 WHERE r1 < 2
    UNION
  SELECT DISTINCT r1, r2 FROM tr123 WHERE r2 > 2;
                                                                  QUERY PLAN                                                                   
-----------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=34.78..36.42 rows=164 width=8) (actual time=3.907..3.909 rows=2 loops=1)
   Group Key: tr123.r1, tr123.r2
   ->  Append  (cost=0.00..33.96 rows=164 width=8) (actual time=2.694..3.900 rows=2 loops=1)
         ->  Unique  (cost=0.00..15.75 rows=82 width=8) (actual time=2.694..2.705 rows=2 loops=1)
               ->  Index Scan using tr123_pkey on tr123  (cost=0.00..15.25 rows=100 width=8) (actual time=2.692..2.700 rows=2 loops=1)
                     Index Cond: (r1 < 2)
         ->  Unique  (cost=0.00..15.75 rows=82 width=8) (actual time=1.192..1.193 rows=0 loops=1)
               ->  Index Scan using tr123_pkey on tr123 tr123_1  (cost=0.00..15.25 rows=100 width=8) (actual time=1.192..1.192 rows=0 loops=1)
                     Index Cond: (r2 > 2)
 Planning Time: 2.666 ms
 Execution Time: 4.066 ms
 Peak Memory Usage: 72 kB
(12 rows)

There is a significant drop in execution time from 1035ms to 4ms for equivalent queries. Would be nice to not have to rewrite queries such as these. Moreover, the planner can optimize away the top level de-duplication step once it realizes that the ranges are disjoint r1 < 2, r1 >= 2 AND r2 > 2.

Selective Queries

Currently, even point queries such as

EXPLAIN (COSTS OFF) SELECT * FROM t WHERE (r1, r2) IN ((3,5), (5,3));

execute a sequential scan which is expensive

                              QUERY PLAN                               
-----------------------------------------------------------------------
 Seq Scan on t
   Remote Filter: (((r1 = 3) AND (r2 = 5)) OR ((r1 = 5) AND (r2 = 3)))

The current workaround involves modifying the query to execute

EXPLAIN (COSTS OFF) SELECT * FROM t WHERE (r1 IN (3,5)) AND (r2 IN (3,5)) AND ((r1, r2) IN ((3,5), (5,3)));

which gives a much better plan

                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Index Scan using t_pkey on t  (cost=0.00..4.13 rows=1 width=8)
   Index Cond: ((r1 = ANY ('{3,5}'::integer[])) AND (r2 = ANY ('{3,5}'::integer[])))
   Remote Filter: (((r1 = 3) AND (r2 = 5)) OR ((r1 = 5) AND (r2 = 3)))

We can also execute

SELECT * FROM t WHERE r1 = 3 AND r2 = 5
  UNION ALL
SELECT * FROM t WHERE r1 = 5 AND r2 = 3;

since there can be no duplicates across the two constituent SELECT queries.

Warning: Please confirm that this issue does not contain any sensitive information

  • I confirm this issue does not contain any sensitive information.
@vbalvida vbalvida added kind/enhancement This is an enhancement of an existing feature area/ysql Yugabyte SQL (YSQL) priority/low Low priority status/awaiting-triage Issue awaiting triage labels Jun 29, 2023
@vbalvida vbalvida changed the title [YSQL] Support OR clauses as index predicates in HybridScanChoices [YSQL] Support OR clauses as index predicates in LSM based IndexScan Jun 30, 2023
@yugabyte-ci yugabyte-ci removed the status/awaiting-triage Issue awaiting triage label Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature priority/low Low priority
Projects
None yet
Development

No branches or pull requests

2 participants