Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

Closed
ajwerner opened this issue Mar 19, 2019 · 8 comments
Closed

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

ajwerner opened this issue Mar 19, 2019 · 8 comments
Assignees
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.

Comments

@ajwerner
Copy link
Contributor

Problem Description

This issue is confusing and not well understood. Last night at 3:26 AM adriatic nodes started to crash with OOM. The cluster was upgraded to the latest beta the day before (2019/04/18 12:40 EST). I did not actively begin investigating until this morning.

The first observation was that the heap profiles showed mostly scan requests (see attached heap profile below). It was then observed that stopping TPC-C and starting all the nodes led to a stable cluster. As soon as TPC-C was restarted, nodes began OOM-ing.

@nvanbenschoten then entered the picture to help with his expert sleuthing skills. With that we were able to track the deadly scans down to a single query:

SELECT
    count(*)
FROM
    (
        SELECT
            DISTINCT s_i_id
        FROM
            order_line
            JOIN stock
            ON s_i_id = ol_i_id AND s_w_id = ol_w_id
        WHERE
            (
                (ol_w_id = 2 AND ol_d_id = 9)
                AND ol_o_id BETWEEN (655736 - 20) AND (655736 - 1)
            )
            AND s_quantity < 13
    );

At this point we looked at the query plan and found something intriguing. The query plans seems to scan stock which is large rather than order_line which is small:

root@:26257/tpcc> EXPLAIN SELECT count(*) FROM (SELECT DISTINCT s_i_id FROM order_line JOIN stock ON (s_i_id = ol_i_id) AND (s_w_id = ol_w_id) WHERE (((ol_w_id = 2) AND (ol_d_id = 9)) AND (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) AND (s_quantity < 13));
               tree               |    field    |                                   description
+---------------------------------+-------------+---------------------------------------------------------------------------------+
  group                           |             |
   │                              | aggregate 0 | count_rows()
   │                              | scalar      |
   └── render                     |             |
        └── distinct              |             |
             │                    | distinct on | s_i_id
             │                    | order key   | s_i_id
             └── render           |             |
                  └── lookup-join |             |
                       │          | table       | order_line@primary
                       │          | type        | inner
                       │          | pred        | ((((@1 = @7) AND (@6 = 2)) AND (@5 = 9)) AND (@4 >= 655716)) AND (@4 <= 655735)
                       └── scan   |             |
                                  | table       | stock@primary
                                  | spans       | /2-/3
                                  | filter      | s_quantity < 13
(16 rows)

On a regular tpcc cluster the plan looks like:

root@:26257/tpcc> EXPLAIN SELECT count(*) FROM (SELECT DISTINCT s_i_id FROM order_line JOIN stock ON (s_i_id = ol_i_id) AND (s_w_id = ol_w_id) WHERE (((ol_w_id = 2) AND (ol_d_id = 9)) AND (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) AND (s_quantity < 13));
               tree               |    field    |       description
+---------------------------------+-------------+-------------------------+
  group                           |             |
   │                              | aggregate 0 | count_rows()
   │                              | scalar      |
   └── render                     |             |
        └── distinct              |             |
             │                    | distinct on | s_i_id
             └── render           |             |
                  └── lookup-join |             |
                       │          | table       | stock@primary
                       │          | type        | inner
                       │          | pred        | (@6 = 2) AND (@7 < 13)
                       └── scan   |             |
                                  | table       | order_line@primary
                                  | spans       | /2/9/655735-/2/9/655715
(14 rows)

At this point we focus our attention to the statistics.

root@:26257/tpcc> show statistics for table stock;
  statistics_name | column_names |             created              | row_count | distinct_count | null_count |    histogram_id
+-----------------+--------------+----------------------------------+-----------+----------------+------------+--------------------+
  __auto__        | {s_i_id}     | 2019-02-27 18:32:26.796549+00:00 |   1000000 |             10 |          0 | 429988075551784963
  __auto__        | {s_w_id}     | 2019-02-27 18:32:26.796549+00:00 |   1000000 |         101273 |          0 | 429988075901714435
  __auto__        | {s_i_id}     | 2019-02-27 18:35:23.576974+00:00 |   1000000 |             10 |          0 | 429988654768357378
  __auto__        | {s_w_id}     | 2019-02-27 18:35:23.576974+00:00 |   1000000 |         101273 |          0 | 429988655061827586
  __auto__        | {s_i_id}     | 2019-02-27 18:36:00.291099+00:00 |   1000000 |             10 |          0 | 429988775204585476
  __auto__        | {s_w_id}     | 2019-02-27 18:36:00.291099+00:00 |   1000000 |         101273 |          0 | 429988775512965124
  __auto__        | {s_i_id}     | 2019-02-27 18:36:50.221065+00:00 |   1000000 |             10 |          0 | 429988939432394757
  __auto__        | {s_w_id}     | 2019-02-27 18:36:50.221065+00:00 |   1000000 |         101273 |          0 | 429988939911856133
  __auto__        | {s_i_id}     | 2019-02-27 18:37:59.971707+00:00 |   1000000 |             10 |          0 | 429989167297658886
  __auto__        | {s_w_id}     | 2019-02-27 18:37:59.971707+00:00 |   1000000 |         101273 |          0 | 429989167839838214
(10 rows)

Some things to notice here are (1) that the stats are old but (2) more alarmingly, the stats for the columns are backwards!?!

root@:26257/tpcc> select count(distinct s_i_id) from stock;
  count
+--------+
  100000
(1 row)
root@:26257/tpcc> select count(distinct s_w_id) from stock;
  count
+-------+
     10
(1 row)

At this point we ran a manual statistics creation and it seemed to create reasonable values:

root@:26257/tpcc> create statistics test_stats from stock;
CREATE STATISTICS
root@:26257/tpcc> show statistics for table stock;
  statistics_name |  column_names  |             created              | row_count | distinct_count | null_count |    histogram_id
+-----------------+----------------+----------------------------------+-----------+----------------+------------+--------------------+
  __auto__        | {s_i_id}       | 2019-02-27 18:35:23.576974+00:00 |   1000000 |             10 |          0 | 429988654768357378
  __auto__        | {s_w_id}       | 2019-02-27 18:35:23.576974+00:00 |   1000000 |         101273 |          0 | 429988655061827586
  __auto__        | {s_i_id}       | 2019-02-27 18:36:00.291099+00:00 |   1000000 |             10 |          0 | 429988775204585476
  __auto__        | {s_w_id}       | 2019-02-27 18:36:00.291099+00:00 |   1000000 |         101273 |          0 | 429988775512965124
  __auto__        | {s_i_id}       | 2019-02-27 18:36:50.221065+00:00 |   1000000 |             10 |          0 | 429988939432394757
  __auto__        | {s_w_id}       | 2019-02-27 18:36:50.221065+00:00 |   1000000 |         101273 |          0 | 429988939911856133
  __auto__        | {s_i_id}       | 2019-02-27 18:37:59.971707+00:00 |   1000000 |             10 |          0 | 429989167297658886
  __auto__        | {s_w_id}       | 2019-02-27 18:37:59.971707+00:00 |   1000000 |         101273 |          0 | 429989167839838214
  test_stats      | {s_dist_02}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         986857 |          0 | 435608338988695556
  test_stats      | {s_i_id}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         101273 |          0 | 435608338536595460
  test_stats      | {s_quantity}   | 2019-03-19 14:58:34.961378+00:00 |   1000000 |             91 |          0 | 435608338610814980
  test_stats      | {s_dist_01}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         989838 |          0 | 435608338761089028
  test_stats      | {s_w_id}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |             10 |          0 | 435608338433802244
  test_stats      | {s_dist_03}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1008181 |          0 | 435608339111444484
  test_stats      | {s_dist_04}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1012198 |          0 | 435608339219447812
  test_stats      | {s_dist_05}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         986878 |          0 | 435608339326763012
  test_stats      | {s_dist_06}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         996501 |          0 | 435608339450363908
  test_stats      | {s_dist_07}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1002457 |          0 | 435608339560628228
  test_stats      | {s_dist_08}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1007497 |          0 | 435608339668369412
  test_stats      | {s_dist_09}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         995973 |          0 | 435608339792822276
  test_stats      | {s_dist_10}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1002525 |          0 | 435608339904823300
  test_stats      | {s_ytd}        | 2019-03-19 14:58:34.961378+00:00 |   1000000 |          29896 |          0 | 435608339998113796
  test_stats      | {s_order_cnt}  | 2019-03-19 14:58:34.961378+00:00 |   1000000 |           7810 |          0 | 435608340123353092
  test_stats      | {s_remote_cnt} | 2019-03-19 14:58:34.961378+00:00 |   1000000 |            189 |          0 | 435608340194820100
  test_stats      | {s_data}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1012256 |          0 | 435608340302790660
(25 rows)

The Questions

  1. Why were those old stats backwards?
  2. Why did they only start getting used last night?
    • Justin's theory: what might have happened is that you had the outdated stats (for some reason) but they were all relatively ok and balanced each other out, and then when you upgraded, since it collects stats on one table at a time, it finished building the stats on ONE of the tables involved in the join but not the other.
    • This is corroborated by evidence, automatic stats collection seems to have finished for the order_line table right at the time crashing begin. These stats, unlike the previous ones, contain information about all of the columns.
  3. Why do we OOM from scans?

Screenshot_2019-03-19 cockroach v19 1 0-beta 20190318 inuse_space

@nvanbenschoten
Copy link
Member

We let ourselves scan until we OOM because index joins don't limit the number of rows they're allowed to retrieve at a given time:

table.disableBatchLimit()

@awoods187 awoods187 added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Mar 19, 2019
@awoods187 awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-table-stats Table statistics (and their automatic refresh). labels Mar 19, 2019
@RaduBerinde
Copy link
Member

Filed #35950 for using batch limit with lookup joins.

@justinj
Copy link
Contributor

justinj commented Mar 19, 2019

If stats are sufficiently out of date I wonder if we should be waiting until we have somewhat recent stats on all tables before we flip over to using them, instead of as they're ready.

@rytaft
Copy link
Collaborator

rytaft commented Mar 19, 2019

The columns were backwards due to a bug that was fixed in #35192, the same PR that caused all columns to have stats.

Is it still crashing now that all stats have been refreshed?

@ajwerner
Copy link
Contributor Author

The remaining mystery is why we're still seeing the bad plan after we've created stats for both tables (the stock stats were created manually).

>explain (opt, verbose) select distinct ol_w_id from order_line;
                             text
+------------------------------------------------------------+
  distinct-on                                                                                                                 
   ├── columns: ol_w_id:3                                                                                                     
   ├── grouping columns: ol_w_id:3                                                                                            
   ├── stats: [rows=10, distinct(3)=10, null(3)=0]                                                                            
   ├── cost: 705125277                                                                                                        
   ├── key: (3)                                                                                                               
   └── scan order_line@order_line_fk                                                                                          
        ├── columns: ol_w_id:3                                                                                                
        ├── stats: [rows=646903924, distinct(3)=10, null(3)=0]                                                                
        ├── cost: 692187199                                                                                                   
        ├── prune: (3)                                                                                                        
        └── interesting orderings: (+3)                                                                                       
(12 rows)
> explain (opt, verbose) select distinct s_w_id from stock;                                                   
                             text
+------------------------------------------------------------+
  distinct-on                                                 
   ├── columns: s_w_id:2                                      
   ├── grouping columns: s_w_id:2                             
   ├── stats: [rows=10, distinct(2)=10, null(2)=0]            
   ├── cost: 1050000.12                                       
   ├── key: (2)                                               
   └── scan stock@stock_s_i_id_idx                            
        ├── columns: s_w_id:2                                 
        ├── stats: [rows=1000000, distinct(2)=10, null(2)=0]  
        ├── cost: 1030000.01                                  
        ├── prune: (2)                                        
        └── interesting orderings: (+2)   

@rytaft rytaft self-assigned this Mar 20, 2019
@rytaft
Copy link
Collaborator

rytaft commented Mar 20, 2019

The logic for selectivity estimation when there is a range such as (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) is not as precise as it could be. I'm working on fixing the selectivity estimation, which should fix this issue. We didn't see this problem earlier because it only happens when there are a large number of items in the order_line table relative to the number of items in the stock table (due to running TPC-C for a long time).

danhhz added a commit to danhhz/cockroach that referenced this issue Mar 21, 2019
Workaround for cockroachdb#35947. The optimizer currently plans a bad query for
TPCC when it has stats, so disable stats for now.

Release note: None
@tbg
Copy link
Member

tbg commented Mar 21, 2019

@nvanbenschoten then entered the picture to help with his expert sleuthing skills. With that we were

PS would be really nice to spell out a bit what the sleuthing was so others can learn.

@ajwerner
Copy link
Contributor Author

Sure, the sleuthing wasn't all that sophisticated. The primary technique was to turn on tracing and then look at long running queries. It turned out that even this was unnecessary as @andreimatei walked over and told us to look at the statements page which clearly presented us with the slow query.

craig bot pushed a commit that referenced this issue Mar 21, 2019
36034: roachtest: temporarily disable auto stats collection for cdc tests r=tbg a=danhhz

Workaround for #35947. The optimizer currently plans a bad query for
TPCC when it has stats, so disable stats for now.

Touches #35327 where local tests saw this happen. Perhaps it's also been the cause of the last two nightly run failures.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 21, 2019
…icates

This commit adds a new scalar operator called Range. Range contains a single
input, which is an And expression that constrains a single variable to a
range. For example, the And expression might be x >= 6 AND x <= 10.

This commit also adds a new normalization rule called ConsolidateSelectFilters,
which consolidates filters that constrain a single variable, and puts them
into a Range operation. For example, filters x >= 6 and x <= 10 would be
consolidated into a single Range operation.

The benefit of consolidating these filters is it allows a single constraint
to be generated for the variable instead of multiple. In the example above,
we can generate the single constraint [/6 - /10] instead of the two
constraints [/6 - ] and [ - /10]. The single constraint allows us to better
estimate the selectivity of the predicate when calculating statistics for
the Select expression.

For example, suppose that x initially has 1,000,000 distinct values. Once
we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values.
Assuming a uniform data distribution, the selectivity of this predicate is
5/1,000,000, or 0.000005. Prior to this commit, we were significantly
overestimating the selectivity as 1/9, or 0.111111.

Fixes cockroachdb#35947

Release note (performance improvement): Improved the selectivity estimation
of range predicates during query optimization.
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 22, 2019
…icates

This commit adds a new scalar operator called Range. Range contains a single
input, which is an And expression that constrains a single variable to a
range. For example, the And expression might be x >= 6 AND x <= 10.

This commit also adds a new normalization rule called ConsolidateSelectFilters,
which consolidates filters that constrain a single variable, and puts them
into a Range operation. For example, filters x >= 6 and x <= 10 would be
consolidated into a single Range operation.

The benefit of consolidating these filters is it allows a single constraint
to be generated for the variable instead of multiple. In the example above,
we can generate the single constraint [/6 - /10] instead of the two
constraints [/6 - ] and [ - /10]. The single constraint allows us to better
estimate the selectivity of the predicate when calculating statistics for
the Select expression.

For example, suppose that x initially has 1,000,000 distinct values. Once
we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values.
Assuming a uniform data distribution, the selectivity of this predicate is
5/1,000,000, or 0.000005. Prior to this commit, we were significantly
overestimating the selectivity as 1/9, or 0.111111.

Fixes cockroachdb#35947

Release note (performance improvement): Improved the selectivity estimation
of range predicates during query optimization.
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 22, 2019
…icates

This commit adds a new scalar operator called Range. Range contains a single
input, which is an And expression that constrains a single variable to a
range. For example, the And expression might be x >= 6 AND x <= 10.

This commit also adds a new normalization rule called ConsolidateSelectFilters,
which consolidates filters that constrain a single variable, and puts them
into a Range operation. For example, filters x >= 6 and x <= 10 would be
consolidated into a single Range operation.

The benefit of consolidating these filters is it allows a single constraint
to be generated for the variable instead of multiple. In the example above,
we can generate the single constraint [/6 - /10] instead of the two
constraints [/6 - ] and [ - /10]. The single constraint allows us to better
estimate the selectivity of the predicate when calculating statistics for
the Select expression.

For example, suppose that x initially has 1,000,000 distinct values. Once
we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values.
Assuming a uniform data distribution, the selectivity of this predicate is
5/1,000,000, or 0.000005. Prior to this commit, we were significantly
overestimating the selectivity as 1/9, or 0.111111.

Fixes cockroachdb#35947

Release note (performance improvement): Improved the selectivity estimation
of range predicates during query optimization.
craig bot pushed a commit that referenced this issue Mar 24, 2019
36040: opt: add a range operator to fix selectivity estimation of range predicates r=rytaft a=rytaft

This commit adds a new scalar operator called `Range`. `Range` contains a single
input, which is an `And` expression that constrains a single variable to a
range. For example, the `And` expression might be `x >= 6 AND x <= 10`.

This commit also adds a new normalization rule called `ConsolidateSelectFilters`,
which consolidates filters that constrain a single variable, and puts them
into a `Range` operation. For example, filters `x >= 6` and `x <= 10` would be
consolidated into a single `Range` operation.

The benefit of consolidating these filters is it allows a single constraint
to be generated for the variable instead of multiple. In the example above,
we can generate the single constraint `[/6 - /10]` instead of the two
constraints `[/6 - ]` and `[ - /10]`. The single constraint allows us to better
estimate the selectivity of the predicate when calculating statistics for
a `Select` expression.

For example, suppose that `x` initially has 1000000 distinct values. Once
we apply the predicate `x >= 6 AND x <= 10`, it has at most 5 distinct values.
Assuming a uniform data distribution, the selectivity of this predicate is
`5/1000000`, or `0.000005`. Prior to this commit, we were significantly
overestimating the selectivity as `1/9`, or `0.111111`.

Fixes #35947

Release note (performance improvement): Improved the selectivity estimation
of range predicates during query optimization.

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@craig craig bot closed this as completed in #36040 Mar 24, 2019
rytaft added a commit to rytaft/cockroach that referenced this issue Mar 24, 2019
…icates

This commit adds a new scalar operator called Range. Range contains a single
input, which is an And expression that constrains a single variable to a
range. For example, the And expression might be x >= 6 AND x <= 10.

This commit also adds a new normalization rule called ConsolidateSelectFilters,
which consolidates filters that constrain a single variable, and puts them
into a Range operation. For example, filters x >= 6 and x <= 10 would be
consolidated into a single Range operation.

The benefit of consolidating these filters is it allows a single constraint
to be generated for the variable instead of multiple. In the example above,
we can generate the single constraint [/6 - /10] instead of the two
constraints [/6 - ] and [ - /10]. The single constraint allows us to better
estimate the selectivity of the predicate when calculating statistics for
the Select expression.

For example, suppose that x initially has 1,000,000 distinct values. Once
we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values.
Assuming a uniform data distribution, the selectivity of this predicate is
5/1,000,000, or 0.000005. Prior to this commit, we were significantly
overestimating the selectivity as 1/9, or 0.111111.

Fixes cockroachdb#35947

Release note (performance improvement): Improved the selectivity estimation
of range predicates during query optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-table-stats Table statistics (and their automatic refresh). C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting.
Projects
None yet
Development

No branches or pull requests

7 participants