-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachtest/unoptimized-query-oracle: incorrect results for geospatial #103616
Comments
Simplified test case: CREATE TABLE t1 (col2_4 GEOGRAPHY NULL);
insert into t1 values ('0102000020E610000000000000'::GEOGRAPHY);
SET testing_optimizer_random_seed = 656627027908814488;
SET testing_optimizer_disable_rule_probability = 0.579;
SELECT col2_4 FROM t1
WHERE
(1.2345678901234566e-43:::FLOAT8 <= st_distance(col2_4::GEOGRAPHY, '0106000020E610000000000000':::GEOGRAPHY::GEOGRAPHY)::FLOAT8);
col2_4
----------
(0 rows)
SET testing_optimizer_disable_rule_probability = 0.578;
SELECT col2_4 FROM t1
WHERE
(1.2345678901234566e-43:::FLOAT8 <= st_distance(col2_4::GEOGRAPHY, '0106000020E610000000000000':::GEOGRAPHY::GEOGRAPHY)::FLOAT8);
col2_4
------------------------------
0102000020E610000000000000
(1 row) |
Removing the release-blocker label. This problem occurs on v22.1.20 as well. |
The disabled rule causing the testing discrepancy is |
@DrewKimball The assumption is that I'm not sure what the fix is here to preserve the ability to use inverted indexes. Maybe create new versions of functions |
@DrewKimball and I discussed this and we think we could change the rule to construct this expression instead:
The RHS of the top-level AND should return NULL correctly. |
A better approach might be to add an optional We think that these expressions are equivalent (in the context of a filter):
And these two expressions are equivalent (in the context of a filter):
We need to consider the similar rules that transform |
The ideas to generate extra terms ANDed with Even if NOT
The whole point of these transformation rules is to trigger inverted index access, though, so additional work would be required to not only allow the new functions to trigger index access, but also filter out rows with null and empty geography expressions. It might not be a trivial change. Another alternative is to get rid of the transformation rules entirely, but there may be customers depending on them for their use cases, so that might not be a practical solution. |
Another idea is, instead of rewriting the predicate, push the generation of the @mgartner @DrewKimball What do you think of this approach? |
I agree, this seems like the right approach. |
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
108954: invertedidx: derive geog/geom filters which enable inverted index scan r=DrewKimball,mgartner a=msirek The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: #103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected. ---- xform: move st_distance rules tests from norm to xform This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: #103616 Release note: None ---- invertedidx: move geospatial filter derivation functions out of norm This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: #103616 Release note: None Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
We have already fixed cockroachdb#103616 in 23.2, but are choosing to revert the backport to 23.1 to reduce risk. Add the involved optimizer rules to the essential rule list to try and prevent duplicate failures of the test in release-23.1. Informs: cockroachdb#103616 Release note: None
roachtest.unoptimized-query-oracle/disable-rules=all/rand-tables failed with artifacts on release-23.1 @ 4d53217fb2d27559257d46261badb3be91428d1a:
Parameters:
ROACHTEST_cloud=gce
,ROACHTEST_cpu=4
,ROACHTEST_encrypted=false
,ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-28097
The text was updated successfully, but these errors were encountered: