Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] for non-prepared statements, optimize pg_statistic system table lookups #5051

Closed
kmuthukk opened this issue Jul 11, 2020 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature

Comments

@kmuthukk
Copy link
Collaborator

kmuthukk commented Jul 11, 2020

When not using a prepared statement that's repeatedly used for re-execution, if the query planner doesn't have an obvious plan based on indexes or primary key for a given statement, it ends up going to the pg_statistic table.

For example, if you have two tables t1 and t2 both with two columns k (primary key) and v, then:

this join:

SELECT * from t1, t2 WHERE t1.k = 'k-123'

doesn't seem to need the help of pg_statistic table, whereas this one does:

SELECT * from t1, t2 WHERE t1.k = t2.k

And for such queries, if a "prepared" statement is not being used, then every execute ends up doing extra lookups on the pg_statistic table.

For the above example, we see two calls to pg_statistic each time the statement is prepared (one for t1 and one for t2). Given that this Postgres system table lives in yb-master, this incurs additional RPCs calls from yb-tserver to yb-master leaders and also increases the likelihood of yb-master becoming a bottleneck.

Test program which illustrates the issue:
https://gist.github.com/kmuthukk/611813ab4685179038654486d2ae1c0a

The workaround would be to use a prepared statement; but since we do find a lot of use of direct "execute" without a separate prepare step in the real-world, we should optimize/cache these lookups on each server rather than go to yb-master each time.

For python specifically, the psycopg2 driver doesn't have a cursor.prepare() API. But to work around you can use Postgres' explicity PREPARE statement.

For example, changing:

  for idx in range(num_rows):
     cur.execute("""SELECT * from t1, t2 WHERE t1.k = t2.k AND %s = %s LIMIT 1""", ("foo", "foo"))

to:

  cur.execute("PREPARE myplan as SELECT * from t1, t2 WHERE t1.k = t2.k AND $1 = $2 LIMIT 1")
  for idx in range(num_rows):
    cur.execute("""EXECUTE myplan(%s, %s)""", ("foo", "foo"))

addresses the issue.

@kmuthukk kmuthukk added area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature labels Jul 11, 2020
m-iancu added a commit that referenced this issue Jul 24, 2020
…e lookups #5051

Summary:
Allow caching of negative (not found) results for pg_statistic and
pg_statistic_ext tables since we do not change those in YSQL yet.

Additionally, update the debug utilities for YSQL by removing the general
variable 'yb_debug_mode' and adding:

yb_debug_report_error_stacktrace:
   - Add stacktrace information to every YSQL error.
   - Effectively replaces (yb_debug_mode).
yb_debug_log_catcache_events:
   - Log cache misses and cache refresh events.
yb_debug_log_internal_restarts:
   - Log automatic statement (or transaction) restarts
   - e.g. read-restarts and schema mismatch restarts.
yb_debug_log_docdb_requests:
   - Log protobufs of all outgoing DocDB requests.

Test Plan:
In ysqlsh, run the following queries and check that no
'Catalog cache miss' entries are found in the tserver's
postgresql log.

```
-- setup tables
create table t1(k int primary key, v int);
create table t2(k int primary key, v int);
-- Execute the query once to load up the caches
select * from t1, t2 where t1.k = t2.k;
-- enable cache events log and re-run the query.
set yb_debug_log_catcache_events = true;
select * from t1, t2 where t1.k = t2.k;
select * from t1, t2 where t1.k = t2.k;
select * from t1, t2 where t1.k = t2.k;
-- reset (disable) the cache events log.
reset yb_debug_log_catcache_events;
```

Reviewers: neha, neil, alex, dmitry

Reviewed By: dmitry

Subscribers: kannan, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D8936
m-iancu added a commit that referenced this issue Jul 28, 2020
…pg_statistic system table lookups

Summary:
Allow caching of negative (not found) results for pg_statistic and
pg_statistic_ext tables since we do not change those in YSQL yet.

Additionally, update the debug utilities for YSQL by removing the general
variable 'yb_debug_mode' and adding:

yb_debug_report_error_stacktrace:
   - Add stacktrace information to every YSQL error.
   - Effectively replaces (yb_debug_mode).
yb_debug_log_catcache_events:
   - Log cache misses and cache refresh events.
yb_debug_log_internal_restarts:
   - Log automatic statement (or transaction) restarts
   - e.g. read-restarts and schema mismatch restarts.
yb_debug_log_docdb_requests:
   - Log protobufs of all outgoing DocDB requests.

Test Plan:
Jenkins: rebase: 2.2

In ysqlsh, run the following queries and check that no
'Catalog cache miss' entries are found in the tserver's
postgresql log.

```
-- setup tables
create table t1(k int primary key, v int);
create table t2(k int primary key, v int);
-- Execute the query once to load up the caches
select * from t1, t2 where t1.k = t2.k;
-- enable cache events log and re-run the query.
set yb_debug_log_catcache_events = true;
select * from t1, t2 where t1.k = t2.k;
select * from t1, t2 where t1.k = t2.k;
select * from t1, t2 where t1.k = t2.k;
-- reset (disable) the cache events log.
reset yb_debug_log_catcache_events;
```

Reviewers: neha, neil, dmitry, alex

Reviewed By: alex

Subscribers: yql, kannan

Differential Revision: https://phabricator.dev.yugabyte.com/D9034
@m-iancu
Copy link
Contributor

m-iancu commented Aug 7, 2020

Fixed by c41c3c9.

@m-iancu m-iancu closed this as completed Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants