Performant way to estimate distances for bulk `inrange` search? #119

Datseris · 2020-12-03T19:43:58Z

Hi there, I'm doing standard bulk in-range searches via the syntax vec_of_idxs = NearestNeighbors.inrange(tree, queries, r). My data are SVector{D, Float64} (but I'm not sure whether this matters.

At the moment I need the distances of the found neighbors from the queries, even more than the indices. I've noticed that while the above call is spectacularly fast, the code I wrote to get the distances is veeeery slow:

function _NN_get_ds(tree::KDTree, query, idxs)
    if tree.reordered
        ds = [
            evaluate(tree.metric, query, tree.data[
                findfirst(isequal(i), tree.indices)
            ]) for i in idxs]
    else
        ds = [evaluate(tree.metric, query, tree.data[i]) for i in idxs]
    end
end

and now I transform my original code as

    vec_of_idxs = NearestNeighbors.inrange(tree, queries, r)
    vec_of_ds = [ _NN_get_ds(tree, queries[j], vec_of_idxs[j]) for j in 1:length(queries)]

I'm wonder, whether something already exists in this library, that provides these distances in a faster way?

The text was updated successfully, but these errors were encountered:

Datseris · 2020-12-03T20:10:06Z

PS: After doing some testing, the bottleneck is in fact the weird clause I've written if tree.reordered. I obviously shouldn't be using findfirst in such an inner loop... What is the correct way to translate tree.indices to data indices when the tree is reordered?

KristofferC · 2020-12-03T20:26:38Z

I guess you could create the inverse lookup (for all points) a single time and then reuse that.

Datseris · 2020-12-03T20:30:47Z

I guess that makes sense. But how does the tree itself know the correct indices? I mean, for the "skip" predicate version of e.g. knn, the tree needs to somehow know the correct indices as well, no? Can't I obtain them the same way the tree obtains them when it checks to skip or not?

KristofferC · 2020-12-03T21:06:33Z

It just does this

NearestNeighbors.jl/src/tree_ops.jl

Line 97 in 2efd998

idx = tree.reordered ? z : tree.indices[z]

KristofferC · 2024-06-17T13:59:45Z

I'm a little bit unclear what should be done here. For computing the distances you just loop through the points and compute the distance. Feel free to open a new issue with a bit more details if there is still an issue regarding this.

Datseris mentioned this issue Dec 13, 2020

Increase performance of bulksearch for WithinRange for KDTrees JuliaNeighbors/Neighborhood.jl#15

Open

KristofferC closed this as completed Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performant way to estimate distances for bulk `inrange` search? #119

Performant way to estimate distances for bulk `inrange` search? #119

Datseris commented Dec 3, 2020

Datseris commented Dec 3, 2020

KristofferC commented Dec 3, 2020

Datseris commented Dec 3, 2020

KristofferC commented Dec 3, 2020

KristofferC commented Jun 17, 2024

Performant way to estimate distances for bulk inrange search? #119

Performant way to estimate distances for bulk inrange search? #119

Comments

Datseris commented Dec 3, 2020

Datseris commented Dec 3, 2020

KristofferC commented Dec 3, 2020

Datseris commented Dec 3, 2020

KristofferC commented Dec 3, 2020

KristofferC commented Jun 17, 2024

Performant way to estimate distances for bulk `inrange` search? #119

Performant way to estimate distances for bulk `inrange` search? #119