Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: add normalization rule to convert ST_Distance(...) < x to ST_DWithin(...) #52028

Closed
rytaft opened this issue Jul 28, 2020 · 9 comments · Fixed by #53066
Closed

opt: add normalization rule to convert ST_Distance(...) < x to ST_DWithin(...) #52028

rytaft opened this issue Jul 28, 2020 · 9 comments · Fixed by #53066
Assignees
Labels
A-spatial Spatial work that is *not* related to builtins. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@rytaft
Copy link
Collaborator

rytaft commented Jul 28, 2020

Queries that use ST_DWithin in the WHERE or ON clause can take advantage of an inverted index for acceleration, while queries that use ST_Distance cannot. However, in cases where ST_Distance is used as part of an inequality condition, it should be possible to convert it to an equivalent condition with ST_DWithin. We should create a normalization rule to perform this conversion so that such queries can take advantage of index acceleration if a suitable inverted index is available.

@rytaft rytaft added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. A-sql-optimizer SQL logical planning and optimizations. A-spatial Spatial work that is *not* related to builtins. labels Jul 28, 2020
@rytaft rytaft self-assigned this Jul 28, 2020
@otan
Copy link
Contributor

otan commented Aug 18, 2020

heads up that ST_Distance(a, b) <= x == ST_DWithin(a, b, x), but ST_Distance(a, b) < x == ST_DWithin(a, b, x-SmallestNonzeroFloat64), most likely.

@DrewKimball
Copy link
Collaborator

Would we want to perform this transformation in all cases, or only in cases where an inverted index scan might be possible? I'm asking because it looks to me like in order for an inverted index scan to be generated, one of the st_distance arguments must be a constant, and the value on the right side of the <= comparison must also be constant.

@otan
Copy link
Contributor

otan commented Aug 18, 2020

i think this is applicable in all cases

@DrewKimball
Copy link
Collaborator

Doing something like x-SmallestNonzeroFloat64 runs into a problem where it gets simplified to just x. One alternative could be to make the replacement: ST_Distance(a, b) < x == ST_DWithin(a, b, x) AND ST_DISTANCE(a, b) != x. The ST_DWithin function could then be used to generate an inverted index scan, and the other conjunct would remove the extra rows.

@otan
Copy link
Contributor

otan commented Aug 18, 2020

We want to avoid calculating ST_Distance as it is more expensive than it's counterpart ST_DWithin which has early exit functionality. Why does ST_DWithin(a, b, x-math.SmallestNonzeroFloat64) get simplified to just x?

@DrewKimball
Copy link
Collaborator

It wouldn't happen if x was a variable, but if x is a constant, constant folding rules fire and we lose precision.

@DrewKimball
Copy link
Collaborator

Easiest thing for that case might be to add another function, ST_DWithinExclusive (or something) with the same early-exit behavior as ST_DWithin but which is equivalent to ST_Distance(a, b) < x.

@otan
Copy link
Contributor

otan commented Aug 19, 2020

but if x is a constant, constant folding rules fire and we lose precision

it wouldn't fold, say, 100 to 99.999999?

Easiest thing for that case might be to add another function, ST_DWithinExclusive (or something) with the same early-exit behavior as ST_DWithin but which is equivalent to ST_Distance(a, b) < x.

works for me if you're willing

@DrewKimball
Copy link
Collaborator

Unfortunately not. There's also the problem of what to do if x is already the minimum value. Yeah, I'll try adding a new function.

DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 19, 2020
`ST_DWithin` is equivalent to the expression `ST_Distance <= x`.
Similar reformulations can be made for different comparison operators
(<, >, >=). This commit adds two rules that replace expressions of the
latter form with either `ST_DWithin` or `ST_DWithinExclusive`. This
replacement is desirable because the `ST_DWithin` function can exit early.
`ST_DWithin` can also be used to generate an inverted index scan.

Fixes cockroachdb#52028

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 19, 2020
`ST_DWithin` is equivalent to the expression `ST_Distance <= x`.
Similar reformulations can be made for different comparison operators
(<, >, >=). This commit adds two rules that replace expressions of the
latter form with either `ST_DWithin` or `ST_DWithinExclusive`. This
replacement is desirable because the `ST_DWithin` function can exit early.
`ST_DWithin` can also be used to generate an inverted index scan.

Fixes cockroachdb#52028

Release note: None
DrewKimball added a commit to DrewKimball/cockroach that referenced this issue Aug 20, 2020
`ST_DWithin` is equivalent to the expression `ST_Distance <= x`.
Similar reformulations can be made for different comparison operators
(<, >, >=). This commit adds two rules that replace expressions of the
latter form with either `ST_DWithin` or `ST_DWithinExclusive`. This
replacement is desirable because the `ST_DWithin` function can exit early.
`ST_DWithin` can also be used to generate an inverted index scan.

Fixes cockroachdb#52028

Release note: None
craig bot pushed a commit that referenced this issue Aug 20, 2020
52938: sql: implement idle_in_transaction_session_timeout r=RichardJCai a=RichardJCai

Release note (sql change): Implement IdleInTransactionSessionTimeout
variable to allow terminating sessions that are idle in a transaction
past the provides threshold.

Set the variable by using SET idle_in_transaction_session_timeout = 'time'.

Sessions that are idle in OPEN, ABORTED and DONE(COMMITWAIT) transaction states
will be terminated if the user idles longer than the threshold time.

Fixes #5924

53041: cli/debug: add SQL-decoding to printed keys r=dt a=dt

This adds a `--decode-as-table` flag to `debug keys --values`.

When passed a base64 encoded Descriptor, which can be obtained via
`select encode(descriptor, 'base64') from system.descriptor where id = x`,
it will attempt to decode KVs using a SQL row-fetcher backed by that
descriptor and print them as col1=val1, col2=val2,...

Release note: none.

53060: backupccl: refactor backup destination resolution r=dt a=pbardea

During backup planning we need to resolve the backup destination. This
means that we need to:
1. Determine if we're going to resolve to a backup collection or not
2. Determine if we should auto-append
3. Read any encryption options we may need if we determine that we're
donig an incremental backup
4. Find the previous backups in the chain if we're doing an incremental
backup.

Release note: None

53066: opt: add rule to replace ST_Distance with ST_DWithin r=DrewKimball a=DrewKimball

`ST_DWithin` is equivalent to the expression `ST_Distance <= x`.
Similar reformulations can be made for different comparison operators
(<, >, >=). 

This PR consists of two commits. The first commit adds an internal builtin
function `ST_DWithinExclusive`, which is equivalent to `ST_Distance < x`
but is able to exit early.

The second commit adds two rules that replace expressions of the form 
`ST_Distance <= x` with either `ST_DWithin` or `ST_DWithinExclusive` 
depending on the comparison operator used. This replacement is desirable 
because the `ST_DWithin` function can exit early, while `ST_Distance` cannot
do the same. `ST_DWithin` can also be used to generate an inverted index scan.

Fixes #52028

Release note: None

53085: colexec: fix dynamic batch behavior in hash joiner with low mem limit r=yuzefovich a=yuzefovich

The disk-spilling logic in the hash joiner relies on the assumption that
a memory limit error can only occur during the building of the hash
table. However, this assumption was recently broken in the case of the
low `workmem` limit setting by the dynamic batch size work - it now
became possible for an error to occur during output population. As
a result, we might have already consumed some batches from the left
input and have emitted partial results before we fall back to the
external hash joiner, and this would result in some tuples not being
emitted at all (if we have a pending batch from the left).

This problem is now fixed by restoring the invariant with the
introduction of an unlimited allocator into the in-memory hash joiner.
The unlimited allocator is used only to populate the output (meaning
that we have already successfully built the hash table from the right
side) at the phase of the hash join algorithm when we consume the left
side one batch at a time and populate the output also one batch at
a time, so such choice is acceptable. This additionally allows to
properly account for the memory used by the output batch.

Note that the only other disk-spilling operator (different sorters that
fallback to the external sort) weren't impacted by this because they are
able to correctly export the buffered tuples regardless of when the
memory limit error occurs.

Fixes: #52859.

Release note: None

53146: rangefeed: don't delete txn records from the rngfeed pusher r=andreimatei a=andreimatei

Before this patch, the rangefeed txn pushed was GC'ing the records of
the transactions that it found to be aborted or committed when it
PUSH_TOUCHED's them. Deleting the txn record is not a nice thing to do
if the txn's coordinator might still be running, and so this patch
terminates the practice. If the coordinator is still running and it had
STAGED the txn record, a PUSH_TOUCH might cause a transition to ABORTED
or COMMITTED. If such a transition is followed by the GC of the txn
record, this causes an ambigous commit error for the coordinator, which
can't tell whether the transaction committed (implicitly) or was rolled
back.

As the code stands, this patch doesn't actually help eliminate the
ambiguity in the ABORTED case (it does in the COMMIT case though),
because the coordinator doesn't distinguish between an ABORTED record
and a missing record; it treats both the same, as ambiguous. But we're
going to improve that, and the coordinator is gonna consider an existing
record as non-ambiguous. The GC'ed case is going to remain the ambiguous
one, and the idea is to avoid that as much as possible by only doing GC
from the gc-queue (after an hour) and from the coordinator.

Touches #52566

Release note (bug fix): Some rare AmbiguousCommitErrors happening when
CDC was used were eliminated.

Co-authored-by: richardjcai <caioftherichard@gmail.com>
Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: Paul Bardea <pbardea@gmail.com>
Co-authored-by: Drew Kimball <drewk@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig craig bot closed this as completed in d7c761d Aug 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-spatial Spatial work that is *not* related to builtins. A-sql-optimizer SQL logical planning and optimizations. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants