stats: adriatic crashed with v19.1.0-beta.20190318 #35947

ajwerner · 2019-03-19T15:58:28Z

Problem Description

This issue is confusing and not well understood. Last night at 3:26 AM adriatic nodes started to crash with OOM. The cluster was upgraded to the latest beta the day before (2019/04/18 12:40 EST). I did not actively begin investigating until this morning.

The first observation was that the heap profiles showed mostly scan requests (see attached heap profile below). It was then observed that stopping TPC-C and starting all the nodes led to a stable cluster. As soon as TPC-C was restarted, nodes began OOM-ing.

@nvanbenschoten then entered the picture to help with his expert sleuthing skills. With that we were able to track the deadly scans down to a single query:

SELECT
    count(*)
FROM
    (
        SELECT
            DISTINCT s_i_id
        FROM
            order_line
            JOIN stock
            ON s_i_id = ol_i_id AND s_w_id = ol_w_id
        WHERE
            (
                (ol_w_id = 2 AND ol_d_id = 9)
                AND ol_o_id BETWEEN (655736 - 20) AND (655736 - 1)
            )
            AND s_quantity < 13
    );

At this point we looked at the query plan and found something intriguing. The query plans seems to scan stock which is large rather than order_line which is small:

root@:26257/tpcc> EXPLAIN SELECT count(*) FROM (SELECT DISTINCT s_i_id FROM order_line JOIN stock ON (s_i_id = ol_i_id) AND (s_w_id = ol_w_id) WHERE (((ol_w_id = 2) AND (ol_d_id = 9)) AND (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) AND (s_quantity < 13));
               tree               |    field    |                                   description
+---------------------------------+-------------+---------------------------------------------------------------------------------+
  group                           |             |
   │                              | aggregate 0 | count_rows()
   │                              | scalar      |
   └── render                     |             |
        └── distinct              |             |
             │                    | distinct on | s_i_id
             │                    | order key   | s_i_id
             └── render           |             |
                  └── lookup-join |             |
                       │          | table       | order_line@primary
                       │          | type        | inner
                       │          | pred        | ((((@1 = @7) AND (@6 = 2)) AND (@5 = 9)) AND (@4 >= 655716)) AND (@4 <= 655735)
                       └── scan   |             |
                                  | table       | stock@primary
                                  | spans       | /2-/3
                                  | filter      | s_quantity < 13
(16 rows)

On a regular tpcc cluster the plan looks like:

root@:26257/tpcc> EXPLAIN SELECT count(*) FROM (SELECT DISTINCT s_i_id FROM order_line JOIN stock ON (s_i_id = ol_i_id) AND (s_w_id = ol_w_id) WHERE (((ol_w_id = 2) AND (ol_d_id = 9)) AND (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) AND (s_quantity < 13));
               tree               |    field    |       description
+---------------------------------+-------------+-------------------------+
  group                           |             |
   │                              | aggregate 0 | count_rows()
   │                              | scalar      |
   └── render                     |             |
        └── distinct              |             |
             │                    | distinct on | s_i_id
             └── render           |             |
                  └── lookup-join |             |
                       │          | table       | stock@primary
                       │          | type        | inner
                       │          | pred        | (@6 = 2) AND (@7 < 13)
                       └── scan   |             |
                                  | table       | order_line@primary
                                  | spans       | /2/9/655735-/2/9/655715
(14 rows)

At this point we focus our attention to the statistics.

root@:26257/tpcc> show statistics for table stock;
  statistics_name | column_names |             created              | row_count | distinct_count | null_count |    histogram_id
+-----------------+--------------+----------------------------------+-----------+----------------+------------+--------------------+
  __auto__        | {s_i_id}     | 2019-02-27 18:32:26.796549+00:00 |   1000000 |             10 |          0 | 429988075551784963
  __auto__        | {s_w_id}     | 2019-02-27 18:32:26.796549+00:00 |   1000000 |         101273 |          0 | 429988075901714435
  __auto__        | {s_i_id}     | 2019-02-27 18:35:23.576974+00:00 |   1000000 |             10 |          0 | 429988654768357378
  __auto__        | {s_w_id}     | 2019-02-27 18:35:23.576974+00:00 |   1000000 |         101273 |          0 | 429988655061827586
  __auto__        | {s_i_id}     | 2019-02-27 18:36:00.291099+00:00 |   1000000 |             10 |          0 | 429988775204585476
  __auto__        | {s_w_id}     | 2019-02-27 18:36:00.291099+00:00 |   1000000 |         101273 |          0 | 429988775512965124
  __auto__        | {s_i_id}     | 2019-02-27 18:36:50.221065+00:00 |   1000000 |             10 |          0 | 429988939432394757
  __auto__        | {s_w_id}     | 2019-02-27 18:36:50.221065+00:00 |   1000000 |         101273 |          0 | 429988939911856133
  __auto__        | {s_i_id}     | 2019-02-27 18:37:59.971707+00:00 |   1000000 |             10 |          0 | 429989167297658886
  __auto__        | {s_w_id}     | 2019-02-27 18:37:59.971707+00:00 |   1000000 |         101273 |          0 | 429989167839838214
(10 rows)

Some things to notice here are (1) that the stats are old but (2) more alarmingly, the stats for the columns are backwards!?!

root@:26257/tpcc> select count(distinct s_i_id) from stock;
  count
+--------+
  100000
(1 row)

root@:26257/tpcc> select count(distinct s_w_id) from stock;
  count
+-------+
     10
(1 row)

At this point we ran a manual statistics creation and it seemed to create reasonable values:

root@:26257/tpcc> create statistics test_stats from stock;
CREATE STATISTICS

root@:26257/tpcc> show statistics for table stock;
  statistics_name |  column_names  |             created              | row_count | distinct_count | null_count |    histogram_id
+-----------------+----------------+----------------------------------+-----------+----------------+------------+--------------------+
  __auto__        | {s_i_id}       | 2019-02-27 18:35:23.576974+00:00 |   1000000 |             10 |          0 | 429988654768357378
  __auto__        | {s_w_id}       | 2019-02-27 18:35:23.576974+00:00 |   1000000 |         101273 |          0 | 429988655061827586
  __auto__        | {s_i_id}       | 2019-02-27 18:36:00.291099+00:00 |   1000000 |             10 |          0 | 429988775204585476
  __auto__        | {s_w_id}       | 2019-02-27 18:36:00.291099+00:00 |   1000000 |         101273 |          0 | 429988775512965124
  __auto__        | {s_i_id}       | 2019-02-27 18:36:50.221065+00:00 |   1000000 |             10 |          0 | 429988939432394757
  __auto__        | {s_w_id}       | 2019-02-27 18:36:50.221065+00:00 |   1000000 |         101273 |          0 | 429988939911856133
  __auto__        | {s_i_id}       | 2019-02-27 18:37:59.971707+00:00 |   1000000 |             10 |          0 | 429989167297658886
  __auto__        | {s_w_id}       | 2019-02-27 18:37:59.971707+00:00 |   1000000 |         101273 |          0 | 429989167839838214
  test_stats      | {s_dist_02}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         986857 |          0 | 435608338988695556
  test_stats      | {s_i_id}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         101273 |          0 | 435608338536595460
  test_stats      | {s_quantity}   | 2019-03-19 14:58:34.961378+00:00 |   1000000 |             91 |          0 | 435608338610814980
  test_stats      | {s_dist_01}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         989838 |          0 | 435608338761089028
  test_stats      | {s_w_id}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |             10 |          0 | 435608338433802244
  test_stats      | {s_dist_03}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1008181 |          0 | 435608339111444484
  test_stats      | {s_dist_04}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1012198 |          0 | 435608339219447812
  test_stats      | {s_dist_05}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         986878 |          0 | 435608339326763012
  test_stats      | {s_dist_06}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         996501 |          0 | 435608339450363908
  test_stats      | {s_dist_07}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1002457 |          0 | 435608339560628228
  test_stats      | {s_dist_08}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1007497 |          0 | 435608339668369412
  test_stats      | {s_dist_09}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |         995973 |          0 | 435608339792822276
  test_stats      | {s_dist_10}    | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1002525 |          0 | 435608339904823300
  test_stats      | {s_ytd}        | 2019-03-19 14:58:34.961378+00:00 |   1000000 |          29896 |          0 | 435608339998113796
  test_stats      | {s_order_cnt}  | 2019-03-19 14:58:34.961378+00:00 |   1000000 |           7810 |          0 | 435608340123353092
  test_stats      | {s_remote_cnt} | 2019-03-19 14:58:34.961378+00:00 |   1000000 |            189 |          0 | 435608340194820100
  test_stats      | {s_data}       | 2019-03-19 14:58:34.961378+00:00 |   1000000 |        1012256 |          0 | 435608340302790660
(25 rows)

The Questions

Why were those old stats backwards?
- Has this been fixed? @rytaft
Why did they only start getting used last night?
- Justin's theory: what might have happened is that you had the outdated stats (for some reason) but they were all relatively ok and balanced each other out, and then when you upgraded, since it collects stats on one table at a time, it finished building the stats on ONE of the tables involved in the join but not the other.
- This is corroborated by evidence, automatic stats collection seems to have finished for the order_line table right at the time crashing begin. These stats, unlike the previous ones, contain information about all of the columns.
Why do we OOM from scans?
- @nvanbenschoten has some thoughts he'll add

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2019-03-19T15:59:42Z

We let ourselves scan until we OOM because index joins don't limit the number of rows they're allowed to retrieve at a given time:

cockroach/pkg/sql/index_join.go

Line 153 in 86df951

table.disableBatchLimit()

RaduBerinde · 2019-03-19T16:06:55Z

Filed #35950 for using batch limit with lookup joins.

justinj · 2019-03-19T16:10:12Z

If stats are sufficiently out of date I wonder if we should be waiting until we have somewhat recent stats on all tables before we flip over to using them, instead of as they're ready.

rytaft · 2019-03-19T16:13:07Z

The columns were backwards due to a bug that was fixed in #35192, the same PR that caused all columns to have stats.

Is it still crashing now that all stats have been refreshed?

ajwerner · 2019-03-19T17:38:02Z

The remaining mystery is why we're still seeing the bad plan after we've created stats for both tables (the stock stats were created manually).

>explain (opt, verbose) select distinct ol_w_id from order_line;
                             text
+------------------------------------------------------------+
  distinct-on                                                                                                                 
   ├── columns: ol_w_id:3                                                                                                     
   ├── grouping columns: ol_w_id:3                                                                                            
   ├── stats: [rows=10, distinct(3)=10, null(3)=0]                                                                            
   ├── cost: 705125277                                                                                                        
   ├── key: (3)                                                                                                               
   └── scan order_line@order_line_fk                                                                                          
        ├── columns: ol_w_id:3                                                                                                
        ├── stats: [rows=646903924, distinct(3)=10, null(3)=0]                                                                
        ├── cost: 692187199                                                                                                   
        ├── prune: (3)                                                                                                        
        └── interesting orderings: (+3)                                                                                       
(12 rows)

> explain (opt, verbose) select distinct s_w_id from stock;                                                   
                             text
+------------------------------------------------------------+
  distinct-on                                                 
   ├── columns: s_w_id:2                                      
   ├── grouping columns: s_w_id:2                             
   ├── stats: [rows=10, distinct(2)=10, null(2)=0]            
   ├── cost: 1050000.12                                       
   ├── key: (2)                                               
   └── scan stock@stock_s_i_id_idx                            
        ├── columns: s_w_id:2                                 
        ├── stats: [rows=1000000, distinct(2)=10, null(2)=0]  
        ├── cost: 1030000.01                                  
        ├── prune: (2)                                        
        └── interesting orderings: (+2)

rytaft · 2019-03-20T01:55:00Z

The logic for selectivity estimation when there is a range such as (ol_o_id BETWEEN (655736 - 20) AND (655736 - 1))) is not as precise as it could be. I'm working on fixing the selectivity estimation, which should fix this issue. We didn't see this problem earlier because it only happens when there are a large number of items in the order_line table relative to the number of items in the stock table (due to running TPC-C for a long time).

Workaround for cockroachdb#35947. The optimizer currently plans a bad query for TPCC when it has stats, so disable stats for now. Release note: None

tbg · 2019-03-21T20:05:47Z

@nvanbenschoten then entered the picture to help with his expert sleuthing skills. With that we were

PS would be really nice to spell out a bit what the sleuthing was so others can learn.

ajwerner · 2019-03-21T20:09:54Z

Sure, the sleuthing wasn't all that sophisticated. The primary technique was to turn on tracing and then look at long running queries. It turned out that even this was unnecessary as @andreimatei walked over and told us to look at the statements page which clearly presented us with the slow query.

36034: roachtest: temporarily disable auto stats collection for cdc tests r=tbg a=danhhz Workaround for #35947. The optimizer currently plans a bad query for TPCC when it has stats, so disable stats for now. Touches #35327 where local tests saw this happen. Perhaps it's also been the cause of the last two nightly run failures. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>

…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.

36040: opt: add a range operator to fix selectivity estimation of range predicates r=rytaft a=rytaft This commit adds a new scalar operator called `Range`. `Range` contains a single input, which is an `And` expression that constrains a single variable to a range. For example, the `And` expression might be `x >= 6 AND x <= 10`. This commit also adds a new normalization rule called `ConsolidateSelectFilters`, which consolidates filters that constrain a single variable, and puts them into a `Range` operation. For example, filters `x >= 6` and `x <= 10` would be consolidated into a single `Range` operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint `[/6 - /10]` instead of the two constraints `[/6 - ]` and `[ - /10]`. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for a `Select` expression. For example, suppose that `x` initially has 1000000 distinct values. Once we apply the predicate `x >= 6 AND x <= 10`, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is `5/1000000`, or `0.000005`. Prior to this commit, we were significantly overestimating the selectivity as `1/9`, or `0.111111`. Fixes #35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>

…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.

awoods187 added the S-2-temp-unavailability Temp crashes or other availability problems. Can be worked around or resolved by restarting. label Mar 19, 2019

awoods187 added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-sql-table-stats Table statistics (and their automatic refresh). labels Mar 19, 2019

RaduBerinde mentioned this issue Mar 19, 2019

sql: lookup joins should use batch limit #35950

Closed

rytaft self-assigned this Mar 20, 2019

danhhz mentioned this issue Mar 21, 2019

roachtest: cdc/initial-scan/rangefeed=true failed #35327

Closed

danhhz mentioned this issue Mar 21, 2019

roachtest: temporarily disable auto stats collection for cdc tests #36034

Merged

rytaft mentioned this issue Mar 21, 2019

opt: add a range operator to fix selectivity estimation of range predicates #36040

Merged

craig bot closed this as completed in #36040 Mar 24, 2019

rytaft mentioned this issue Mar 24, 2019

release-19.1: opt: add a range operator to fix selectivity estimation of range predicates #36093

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

ajwerner commented Mar 19, 2019

nvanbenschoten commented Mar 19, 2019

RaduBerinde commented Mar 19, 2019

justinj commented Mar 19, 2019

rytaft commented Mar 19, 2019

ajwerner commented Mar 19, 2019

rytaft commented Mar 20, 2019 •

edited

Loading

tbg commented Mar 21, 2019

ajwerner commented Mar 21, 2019

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

stats: adriatic crashed with v19.1.0-beta.20190318 #35947

Comments

ajwerner commented Mar 19, 2019

nvanbenschoten commented Mar 19, 2019

RaduBerinde commented Mar 19, 2019

justinj commented Mar 19, 2019

rytaft commented Mar 19, 2019

ajwerner commented Mar 19, 2019

rytaft commented Mar 20, 2019 • edited Loading

tbg commented Mar 21, 2019

ajwerner commented Mar 21, 2019

rytaft commented Mar 20, 2019 •

edited

Loading