Skip to content

Commit

Permalink
Merge #108954
Browse files Browse the repository at this point in the history
108954: invertedidx: derive geog/geom filters which enable inverted index scan r=DrewKimball,mgartner a=msirek

The query normalization rules FoldEqZeroSTDistance,
FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft
and FoldCmpSTMaxDistanceRight replace binary comparison operations,
=, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions
with equivalent function calls which may enable inverted index scan, for
example st_dwithin, and st_dfullywithin. For non-null and non-empty
geographies and geometries, this rewrite is semantically equivalent, but
for null or empty inputs, `st_distance` uses three-valued boolean logic
and returns a null. `st_dwithin` always returns true or false, so may
return a different value when the expression is negated. For example,
the following test returns `true`, but should return `null`:
```sql
CREATE TABLE g1 (geog GEOGRAPHY);
INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY'));
SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1;
  predicate_result
--------------------
         t
```

The solution is to move all of the logic in the above normalization
rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`,
and use the derived predicate only for the purposes of finding inverted
index scan spans. The spans will always include all rows qualified for
the original predicate, but may include additional rows. The original
filter is then applied to the results of the inverted index scan to
filter out any results with nulls or empty geographies and geometries.

Epic: none
Fixes: #103616

Release note (bug fix): This patch fixes a bug in geospatial queries,
where a query filter of the form `ST_Distance(geog1, geog2) > constant`,
or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one
of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0`
may mistakenly evaluate to `true` when one or both of the inputs is null
or an empty geography/geometry. More rows could be returned by the query
than expected.

----
xform: move st_distance rules tests from norm to xform

This commit moves the tests for the old normalization rules
FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight,
FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm
into xform. The tests now check whether the derived filter triggers the
GenerateInvertedIndexScans rule.

Epic: none
Informs: #103616

Release note: None

----
invertedidx: move geospatial filter derivation functions out of norm

This commit moves functions previously associated with normalization
rules, but now associated with exploration-time geospatial filter
derivation out of the norm package.

Epic: none
Informs: #103616

Release note: None

Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
  • Loading branch information
craig[bot] and Mark Sirek committed Aug 24, 2023
2 parents 58ace12 + 0edeac4 commit abeb090
Showing 7 changed files with 1,167 additions and 586 deletions.
156 changes: 156 additions & 0 deletions pkg/sql/logictest/testdata/logic_test/geospatial
Original file line number Diff line number Diff line change
@@ -6021,3 +6021,159 @@ SELECT st_asmvtgeom(
NULL

subtest end

subtest regression_103616

# Regression test for #103616
statement ok
CREATE TABLE t103616 (
tag string,
geog GEOGRAPHY,
geom GEOMETRY,
INVERTED INDEX geog_idx (geog),
INVERTED INDEX geom_idx (geom)
);

statement ok
insert into t103616 values ('null', null, null);

statement ok
insert into t103616 values ('point (0 0)', 'POINT(0 0)'::GEOGRAPHY, 'POINT(0 0)'::GEOMETRY);

statement ok
insert into t103616 values ('point (10 10)', 'POINT(10 10)'::GEOGRAPHY, 'POINT(10 10)'::GEOMETRY);

statement ok
insert into t103616 values ('empty 1', ST_GeogFromText('POLYGON EMPTY'), ST_GeomFromText('POLYGON EMPTY'));

statement ok
insert into t103616 values ('empty 2', ST_GeogFromText('GEOMETRYCOLLECTION EMPTY'), ST_GeomFromText('GEOMETRYCOLLECTION EMPTY'));

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geog_idx
WHERE
st_distance(geog, 'POINT(0 0)'::GEOGRAPHY, false) = 0;
----
point (0 0)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geog_idx
WHERE
st_distance(geog, 'POINT(0 0)'::GEOGRAPHY, false) = 0;
----
point (0 0)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geog_idx
WHERE
NOT 1.2345678901234566e-43 < st_distance(geog, 'POINT(0 0)'::GEOGRAPHY);
----
point (0 0)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
NOT 1.2345678901234566e-43 > st_distance(geog, 'POINT(0 0)'::GEOGRAPHY);
----
point (10 10)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geog_idx
WHERE
1.2345678901234566e-43 > st_distance(geog, 'POINT(0 0)'::GEOGRAPHY);
----
point (0 0)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
1.2345678901234566e-43 < st_distance(geog, 'POINT(0 0)'::GEOGRAPHY);
----
point (10 10)

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
0 <= st_distance(geog, 'POINT(0 0)'::GEOGRAPHY);
----
point (0 0)
point (10 10)

# Only null and empty geog values should appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
(st_distance(geog, 'POINT(0 0)'::GEOGRAPHY, false) = 0) IS NULL;
----
null
empty 1
empty 2

# Only null and empty geog values should appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
(0 <= st_distance(geog, 'POINT(0 0)'::GEOGRAPHY)) IS NULL;
----
null
empty 1
empty 2

# Null and empty geom values should appear in the output.
query T rowsort
SELECT tag FROM t103616@geom_idx
WHERE
NOT 1.2345678901234566e-43 < st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY);
----
point (0 0)

# Null and empty geom values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
NOT 1.2345678901234566e-43 > st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY);
----
point (10 10)

# Null and empty geom values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geom_idx
WHERE
1.2345678901234566e-43 > st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY);
----
point (0 0)

# Null and empty geom values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
1.2345678901234566e-43 < st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY);
----
point (10 10)

# Null and empty geom values should not appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
0 <= st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY);
----
point (0 0)
point (10 10)

# Only null and empty geog values should appear in the output.
query T rowsort
SELECT tag FROM t103616
WHERE
(0 <= st_maxdistance(geom, 'POINT(0 0)'::GEOMETRY)) IS NULL;
----
null
empty 1
empty 2

Loading

0 comments on commit abeb090

Please sign in to comment.