-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: adriatic crashed with v19.1.0-beta.20190318 #35947
Comments
We let ourselves scan until we OOM because index joins don't limit the number of rows they're allowed to retrieve at a given time: cockroach/pkg/sql/index_join.go Line 153 in 86df951
|
Filed #35950 for using batch limit with lookup joins. |
If stats are sufficiently out of date I wonder if we should be waiting until we have somewhat recent stats on all tables before we flip over to using them, instead of as they're ready. |
The columns were backwards due to a bug that was fixed in #35192, the same PR that caused all columns to have stats. Is it still crashing now that all stats have been refreshed? |
The remaining mystery is why we're still seeing the bad plan after we've created stats for both tables (the stock stats were created manually).
|
The logic for selectivity estimation when there is a range such as |
Workaround for cockroachdb#35947. The optimizer currently plans a bad query for TPCC when it has stats, so disable stats for now. Release note: None
PS would be really nice to spell out a bit what the sleuthing was so others can learn. |
Sure, the sleuthing wasn't all that sophisticated. The primary technique was to turn on tracing and then look at long running queries. It turned out that even this was unnecessary as @andreimatei walked over and told us to look at the statements page which clearly presented us with the slow query. |
36034: roachtest: temporarily disable auto stats collection for cdc tests r=tbg a=danhhz Workaround for #35947. The optimizer currently plans a bad query for TPCC when it has stats, so disable stats for now. Touches #35327 where local tests saw this happen. Perhaps it's also been the cause of the last two nightly run failures. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.
…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.
…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.
36040: opt: add a range operator to fix selectivity estimation of range predicates r=rytaft a=rytaft This commit adds a new scalar operator called `Range`. `Range` contains a single input, which is an `And` expression that constrains a single variable to a range. For example, the `And` expression might be `x >= 6 AND x <= 10`. This commit also adds a new normalization rule called `ConsolidateSelectFilters`, which consolidates filters that constrain a single variable, and puts them into a `Range` operation. For example, filters `x >= 6` and `x <= 10` would be consolidated into a single `Range` operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint `[/6 - /10]` instead of the two constraints `[/6 - ]` and `[ - /10]`. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for a `Select` expression. For example, suppose that `x` initially has 1000000 distinct values. Once we apply the predicate `x >= 6 AND x <= 10`, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is `5/1000000`, or `0.000005`. Prior to this commit, we were significantly overestimating the selectivity as `1/9`, or `0.111111`. Fixes #35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization. Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
…icates This commit adds a new scalar operator called Range. Range contains a single input, which is an And expression that constrains a single variable to a range. For example, the And expression might be x >= 6 AND x <= 10. This commit also adds a new normalization rule called ConsolidateSelectFilters, which consolidates filters that constrain a single variable, and puts them into a Range operation. For example, filters x >= 6 and x <= 10 would be consolidated into a single Range operation. The benefit of consolidating these filters is it allows a single constraint to be generated for the variable instead of multiple. In the example above, we can generate the single constraint [/6 - /10] instead of the two constraints [/6 - ] and [ - /10]. The single constraint allows us to better estimate the selectivity of the predicate when calculating statistics for the Select expression. For example, suppose that x initially has 1,000,000 distinct values. Once we apply the predicate x >= 6 AND x <= 10, it has at most 5 distinct values. Assuming a uniform data distribution, the selectivity of this predicate is 5/1,000,000, or 0.000005. Prior to this commit, we were significantly overestimating the selectivity as 1/9, or 0.111111. Fixes cockroachdb#35947 Release note (performance improvement): Improved the selectivity estimation of range predicates during query optimization.
Problem Description
This issue is confusing and not well understood. Last night at 3:26 AM adriatic nodes started to crash with OOM. The cluster was upgraded to the latest beta the day before (2019/04/18 12:40 EST). I did not actively begin investigating until this morning.
The first observation was that the heap profiles showed mostly scan requests (see attached heap profile below). It was then observed that stopping TPC-C and starting all the nodes led to a stable cluster. As soon as TPC-C was restarted, nodes began OOM-ing.
@nvanbenschoten then entered the picture to help with his expert sleuthing skills. With that we were able to track the deadly scans down to a single query:
At this point we looked at the query plan and found something intriguing. The query plans seems to scan stock which is large rather than order_line which is small:
On a regular tpcc cluster the plan looks like:
At this point we focus our attention to the statistics.
Some things to notice here are (1) that the stats are old but (2) more alarmingly, the stats for the columns are backwards!?!
At this point we ran a manual statistics creation and it seemed to create reasonable values:
The Questions
The text was updated successfully, but these errors were encountered: