-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
invertedidx: derive geog/geom filters which enable inverted index scan #108954
Conversation
3a1cbde
to
ac9ba47
Compare
ac9ba47
to
1919d4c
Compare
[nit] I think I'd prefer something like this:
Feel free to keep it if you'd prefer, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Just some nits and questions.
Reviewed 6 of 6 files at r1, 2 of 2 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @msirek)
pkg/sql/opt/invertedidx/geo.go
line 427 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] I think I'd prefer something like this:
c := g.factory.CustomFuncs() // Could do this higher in the method if private.Name == "st_distance" { return c.MakeSTDWithin(op, args, boundExpr, leftIsFunction), true } return c.MakeSTDFullyWithin(op, args, boundExpr, leftIsFunction), true
Feel free to keep it if you'd prefer, though.
(This is less applicable now)
pkg/sql/opt/norm/comp_funcs.go
line 195 at r1 (raw file):
// otherwise false. func (c *CustomFuncs) MakeSTDFullyWithin( op opt.Operator, args memo.ScalarListExpr, bound opt.ScalarExpr, fnIsLeftArg bool,
[nit] Since we're already eliminating the left/right variants, why not go the whole way and just call MakeSTDWithin
directly?
pkg/sql/opt/norm/comp_funcs.go
line 210 at r1 (raw file):
op opt.Operator, args memo.ScalarListExpr, bound opt.ScalarExpr, fnName string, fnIsLeftArg bool, ) opt.ScalarExpr { var not bool
[nit] We can probably remove the not
logic now, right?
pkg/sql/logictest/testdata/logic_test/geospatial
line 6049 at r1 (raw file):
# Null and empty geom values should not appear in the output. query T rowsort SELECT tag FROM t103616@geog_idx
Should this use geom_idx
instead?
pkg/sql/logictest/testdata/logic_test/geospatial
line 6065 at r1 (raw file):
# Null and empty geog values should not appear in the output. query T rowsort SELECT tag FROM t103616
Why not specify the index like above?
1919d4c
to
fb38a81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/opt/invertedidx/geo.go
line 427 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
(This is less applicable now)
Modified as suggested
pkg/sql/opt/norm/comp_funcs.go
line 195 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] Since we're already eliminating the left/right variants, why not go the whole way and just call
MakeSTDWithin
directly?
Done
pkg/sql/opt/norm/comp_funcs.go
line 210 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] We can probably remove the
not
logic now, right?
I've removed generation of a NOT expression, but still need to detect the not
case and return the fact that nothing was derived.
pkg/sql/logictest/testdata/logic_test/geospatial
line 6049 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should this use
geom_idx
instead?
The comment was wrong, not the index. Updated the comment.
pkg/sql/logictest/testdata/logic_test/geospatial
line 6065 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Why not specify the index like above?
Because the query would error out as inverted index scan can't be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @msirek)
pkg/sql/logictest/testdata/logic_test/geospatial
line 6065 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Because the query would error out as inverted index scan can't be used.
Got it, I was confused by the other case with NOT
, but I'm guessing it can use the index because the NOT
folds into the comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/logictest/testdata/logic_test/geospatial
line 6065 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Got it, I was confused by the other case with
NOT
, but I'm guessing it can use the index because theNOT
folds into the comparison.
Yeah, the NOT just changes >
to <=
, but it's the NOT case in the derived term that matters.
Do we also want to backport this to release-23.1.9-rc? |
Yes, I think so. Included it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This bug was confusing - great job finding a nice solution! Thanks for organizing the commits thoughtfully - it made reviewing it pleasant!
Hopefully we can get some adequate baking time for this on master and the release branches incase there are any edge cases that we missed here.
Reviewed 2 of 6 files at r1, 5 of 5 files at r4, 2 of 2 files at r5, 2 of 2 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @msirek)
pkg/sql/opt/norm/testdata/rules/comp
line 11 at r5 (raw file):
val FLOAT, INVERTED INDEX(geom), INVERTED INDEX(geog)
nit: you can remove these indexes in the second commit.
The query normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight replace binary comparison operations, =, <, >, <=, >= involving the `st_distance` or `st_maxdistance` functions with equivalent function calls which may enable inverted index scan, for example st_dwithin, and st_dfullywithin. For non-null and non-empty geographies and geometries, this rewrite is semantically equivalent, but for null or empty inputs, `st_distance` uses three-valued boolean logic and returns a null. `st_dwithin` always returns true or false, so may return a different value when the expression is negated. For example, the following test returns `true`, but should return `null`: ```sql CREATE TABLE g1 (geog GEOGRAPHY); INSERT INTO g1 VALUES (ST_GeogFromText('MULTILINESTRING EMPTY')); SELECT ST_Distance('POINT(0 0)'::GEOGRAPHY, geog) > 1 AS predicate_result FROM g1; predicate_result -------------------- t ``` The solution is to move all of the logic in the above normalization rules into `geoFilterPlanner.extractInvertedFilterConditionFromLeaf`, and use the derived predicate only for the purposes of finding inverted index scan spans. The spans will always include all rows qualified for the original predicate, but may include additional rows. The original filter is then applied to the results of the inverted index scan to filter out any results with nulls or empty geographies and geometries. Epic: none Fixes: cockroachdb#103616 Release note (bug fix): This patch fixes a bug in geospatial queries, where a query filter of the form `ST_Distance(geog1, geog2) > constant`, or `ST_MaxDistance(geom1, geom2) > constant`, where the operator is one of >, <, >=, <=, or a filter of the form `ST_Distance(geog1, geog2, false) = 0` may mistakenly evaluate to `true` when one or both of the inputs is null or an empty geography/geometry. More rows could be returned by the query than expected.
This commit moves the tests for the old normalization rules FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm into xform. The tests now check whether the derived filter triggers the GenerateInvertedIndexScans rule. Epic: none Informs: cockroachdb#103616 Release note: None
This commit moves functions previously associated with normalization rules, but now associated with exploration-time geospatial filter derivation out of the norm package. Epic: none Informs: cockroachdb#103616 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll merge to master first and let the backport PRs sit for a while once opened.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball)
pkg/sql/opt/norm/testdata/rules/comp
line 11 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: you can remove these indexes in the second commit.
Done
fb38a81
to
0edeac4
Compare
bors r=DrewKimball,mgartner |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 0918300 to blathers/backport-release-23.1-108954: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from 0918300 to blathers/backport-release-23.1.9-rc-108954: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.9-rc failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
The query normalization rules FoldEqZeroSTDistance,
FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight, FoldCmpSTMaxDistanceLeft
and FoldCmpSTMaxDistanceRight replace binary comparison operations,
=, <, >, <=, >= involving the
st_distance
orst_maxdistance
functionswith equivalent function calls which may enable inverted index scan, for
example st_dwithin, and st_dfullywithin. For non-null and non-empty
geographies and geometries, this rewrite is semantically equivalent, but
for null or empty inputs,
st_distance
uses three-valued boolean logicand returns a null.
st_dwithin
always returns true or false, so mayreturn a different value when the expression is negated. For example,
the following test returns
true
, but should returnnull
:The solution is to move all of the logic in the above normalization
rules into
geoFilterPlanner.extractInvertedFilterConditionFromLeaf
,and use the derived predicate only for the purposes of finding inverted
index scan spans. The spans will always include all rows qualified for
the original predicate, but may include additional rows. The original
filter is then applied to the results of the inverted index scan to
filter out any results with nulls or empty geographies and geometries.
Epic: none
Fixes: #103616
Release note (bug fix): This patch fixes a bug in geospatial queries,
where a query filter of the form
ST_Distance(geog1, geog2) > constant
,or
ST_MaxDistance(geom1, geom2) > constant
, where the operator is oneof >, <, >=, <=, or a filter of the form
ST_Distance(geog1, geog2, false) = 0
may mistakenly evaluate to
true
when one or both of the inputs is nullor an empty geography/geometry. More rows could be returned by the query
than expected.
xform: move st_distance rules tests from norm to xform
This commit moves the tests for the old normalization rules
FoldEqZeroSTDistance, FoldCmpSTDistanceLeft, FoldCmpSTDistanceRight,
FoldCmpSTMaxDistanceLeft and FoldCmpSTMaxDistanceRight from norm
into xform. The tests now check whether the derived filter triggers the
GenerateInvertedIndexScans rule.
Epic: none
Informs: #103616
Release note: None
invertedidx: move geospatial filter derivation functions out of norm
This commit moves functions previously associated with normalization
rules, but now associated with exploration-time geospatial filter
derivation out of the norm package.
Epic: none
Informs: #103616
Release note: None