Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invertedidx: derive geog/geom filters which enable inverted index scan #108954

Merged
merged 3 commits into from
Aug 24, 2023

Conversation

msirek
Copy link
Contributor

@msirek msirek commented Aug 17, 2023

The query normalization rules FoldEqZeroSTDistance,
FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft
and FoldCmpSTMaxDistanceRight replace binary comparison operations,
=, <, >, <=, >= involving the st_distance or st_maxdistance functions
with equivalent function calls which may enable inverted index scan, for
example st_dwithin, and st_dfullywithin. For non-null and non-empty
geographies and geometries, this rewrite is semantically equivalent, but
for null or empty inputs, st_distance uses three-valued boolean logic
and returns a null. st_dwithin always returns true or false, so may
return a different value when the expression is negated. For example,
the following test returns true, but should return null:

CREATE TABLE g1 (geog GEOGRAPHY);
INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY'));
SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1;
  predicate_result
--------------------
         t

The solution is to move all of the logic in the above normalization
rules into geoFilterPlanner.extractInvertedFilterConditionFromLeaf,
and use the derived predicate only for the purposes of finding inverted
index scan spans. The spans will always include all rows qualified for
the original predicate, but may include additional rows. The original
filter is then applied to the results of the inverted index scan to
filter out any results with nulls or empty geographies and geometries.

Epic: none
Fixes: #103616

Release note (bug fix): This patch fixes a bug in geospatial queries,
where a query filter of the form ST_Distance(geog1, geog2) > constant,
or ST_MaxDistance(geom1, geom2) > constant, where the operator is one
of >, <, >=, <=, or a filter of the form ST_Distance(geog1, geog2, false) = 0
may mistakenly evaluate to true when one or both of the inputs is null
or an empty geography/geometry. More rows could be returned by the query
than expected.


xform: move st_distance rules tests from norm to xform

This commit moves the tests for the old normalization rules
FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight,
FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm
into xform. The tests now check whether the derived filter triggers the
GenerateInvertedIndexScans rule.

Epic: none
Informs: #103616

Release note: None


invertedidx: move geospatial filter derivation functions out of norm

This commit moves functions previously associated with normalization
rules, but now associated with exploration-time geospatial filter
derivation out of the norm package.

Epic: none
Informs: #103616

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msirek msirek force-pushed the derive-new-inverted-filter branch 3 times, most recently from 3a1cbde to ac9ba47 Compare August 18, 2023 05:27
@msirek msirek requested a review from DrewKimball August 18, 2023 15:34
@msirek msirek marked this pull request as ready for review August 18, 2023 15:34
@msirek msirek requested a review from a team as a code owner August 18, 2023 15:34
@msirek msirek force-pushed the derive-new-inverted-filter branch from ac9ba47 to 1919d4c Compare August 18, 2023 15:40
@DrewKimball
Copy link
Collaborator

pkg/sql/opt/invertedidx/geo.go line 427 at r1 (raw file):

		makeSTDWithinFunc = g.factory.CustomFuncs().MakeSTDFullyWithin
	}
	return makeSTDWithinFunc(op, args, boundExpr, leftIsFunction), true

[nit] I think I'd prefer something like this:

	c := g.factory.CustomFuncs() // Could do this higher in the method
	if private.Name == "st_distance" {
		return c.MakeSTDWithin(op, args, boundExpr, leftIsFunction), true
	}
	return c.MakeSTDFullyWithin(op, args, boundExpr, leftIsFunction), true

Feel free to keep it if you'd prefer, though.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great work! Just some nits and questions.

Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msirek)


pkg/sql/opt/invertedidx/geo.go line 427 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] I think I'd prefer something like this:

	c := g.factory.CustomFuncs() // Could do this higher in the method
	if private.Name == "st_distance" {
		return c.MakeSTDWithin(op, args, boundExpr, leftIsFunction), true
	}
	return c.MakeSTDFullyWithin(op, args, boundExpr, leftIsFunction), true

Feel free to keep it if you'd prefer, though.

(This is less applicable now)


pkg/sql/opt/norm/comp_funcs.go line 195 at r1 (raw file):

// otherwise false.
func (c *CustomFuncs) MakeSTDFullyWithin(
	op opt.Operator, args memo.ScalarListExpr, bound opt.ScalarExpr, fnIsLeftArg bool,

[nit] Since we're already eliminating the left/right variants, why not go the whole way and just call MakeSTDWithin directly?


pkg/sql/opt/norm/comp_funcs.go line 210 at r1 (raw file):

	op opt.Operator, args memo.ScalarListExpr, bound opt.ScalarExpr, fnName string, fnIsLeftArg bool,
) opt.ScalarExpr {
	var not bool

[nit] We can probably remove the not logic now, right?


pkg/sql/logictest/testdata/logic_test/geospatial line 6049 at r1 (raw file):

# Null and empty geom values should not appear in the output.
query T rowsort
SELECT tag FROM t103616@geog_idx

Should this use geom_idx instead?


pkg/sql/logictest/testdata/logic_test/geospatial line 6065 at r1 (raw file):

# Null and empty geog values should not appear in the output.
query T rowsort
SELECT tag FROM t103616

Why not specify the index like above?

@msirek msirek force-pushed the derive-new-inverted-filter branch from 1919d4c to fb38a81 Compare August 18, 2023 22:03
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/opt/invertedidx/geo.go line 427 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

(This is less applicable now)

Modified as suggested


pkg/sql/opt/norm/comp_funcs.go line 195 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] Since we're already eliminating the left/right variants, why not go the whole way and just call MakeSTDWithin directly?

Done


pkg/sql/opt/norm/comp_funcs.go line 210 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] We can probably remove the not logic now, right?

I've removed generation of a NOT expression, but still need to detect the not case and return the fact that nothing was derived.


pkg/sql/logictest/testdata/logic_test/geospatial line 6049 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Should this use geom_idx instead?

The comment was wrong, not the index. Updated the comment.


pkg/sql/logictest/testdata/logic_test/geospatial line 6065 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Why not specify the index like above?

Because the query would error out as inverted index scan can't be used.

@msirek msirek added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Aug 18, 2023
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek)


pkg/sql/logictest/testdata/logic_test/geospatial line 6065 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Because the query would error out as inverted index scan can't be used.

Got it, I was confused by the other case with NOT, but I'm guessing it can use the index because the NOT folds into the comparison.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/logictest/testdata/logic_test/geospatial line 6065 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Got it, I was confused by the other case with NOT, but I'm guessing it can use the index because the NOT folds into the comparison.

Yeah, the NOT just changes > to <=, but it's the NOT case in the derived term that matters.

@michae2
Copy link
Collaborator

michae2 commented Aug 21, 2023

Do we also want to backport this to release-23.1.9-rc?

@msirek
Copy link
Contributor Author

msirek commented Aug 21, 2023

Do we also want to backport this to release-23.1.9-rc?

Yes, I think so. Included it.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: This bug was confusing - great job finding a nice solution! Thanks for organizing the commits thoughtfully - it made reviewing it pleasant!

Hopefully we can get some adequate baking time for this on master and the release branches incase there are any edge cases that we missed here.

Reviewed 2 of 6 files at r1, 5 of 5 files at r4, 2 of 2 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @msirek)


pkg/sql/opt/norm/testdata/rules/comp line 11 at r5 (raw file):

  val FLOAT,
  INVERTED INDEX(geom),
  INVERTED INDEX(geog)

nit: you can remove these indexes in the second commit.

Mark Sirek added 3 commits August 23, 2023 18:27
The query normalization rules FoldEqZeroSTDistance,
FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft
and FoldCmpSTMaxDistanceRight replace binary comparison operations,
=, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions
with equivalent function calls which may enable inverted index scan, for
example st_dwithin, and st_dfullywithin. For non-null and non-empty
geographies and geometries, this rewrite is semantically equivalent, but
for null or empty inputs, `st_distance` uses three-valued boolean logic
and returns a null. `st_dwithin` always returns true or false, so may
return a different value when the expression is negated. For example,
the following test returns `true`, but should return `null`:
```sql
CREATE TABLE g1 (geog GEOGRAPHY);
INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY'));
SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1;
  predicate_result
--------------------
         t
```

The solution is to move all of the logic in the above normalization
rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`,
and use the derived predicate only for the purposes of finding inverted
index scan spans. The spans will always include all rows qualified for
the original predicate, but may include additional rows. The original
filter is then applied to the results of the inverted index scan to
filter out any results with nulls or empty geographies and geometries.

Epic: none
Fixes: cockroachdb#103616

Release note (bug fix): This patch fixes a bug in geospatial queries,
where a query filter of the form `ST_Distance(geog1, geog2) > constant`,
or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one
of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0`
may mistakenly evaluate to `true` when one or both of the inputs is null
or an empty geography/geometry. More rows could be returned by the query
than expected.
This commit moves the tests for the old normalization rules
FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight,
FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm
into xform. The tests now check whether the derived filter triggers the
GenerateInvertedIndexScans rule.

Epic: none
Informs: cockroachdb#103616

Release note: None
This commit moves functions previously associated with normalization
rules, but now associated with exploration-time geospatial filter
derivation out of the norm package.

Epic: none
Informs: cockroachdb#103616

Release note: None
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll merge to master first and let the backport PRs sit for a while once opened.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)


pkg/sql/opt/norm/testdata/rules/comp line 11 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: you can remove these indexes in the second commit.

Done

@msirek msirek force-pushed the derive-new-inverted-filter branch from fb38a81 to 0edeac4 Compare August 24, 2023 01:28
@msirek
Copy link
Contributor Author

msirek commented Aug 24, 2023

bors r=DrewKimball,mgartner

@craig
Copy link
Contributor

craig bot commented Aug 24, 2023

Build succeeded:

@craig craig bot merged commit abeb090 into cockroachdb:master Aug 24, 2023
@blathers-crl
Copy link

blathers-crl bot commented Aug 24, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 0918300 to blathers/backport-release-23.1-108954: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.x failed. See errors above.


error creating merge commit from 0918300 to blathers/backport-release-23.1.9-rc-108954: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 23.1.9-rc failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-23.1.x Flags PRs that need to be backported to 23.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest/unoptimized-query-oracle: incorrect results for geospatial
5 participants