Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performant way to estimate distances for bulk inrange search? #119

Closed
Datseris opened this issue Dec 3, 2020 · 5 comments
Closed

Performant way to estimate distances for bulk inrange search? #119

Datseris opened this issue Dec 3, 2020 · 5 comments

Comments

@Datseris
Copy link

Datseris commented Dec 3, 2020

Hi there, I'm doing standard bulk in-range searches via the syntax vec_of_idxs = NearestNeighbors.inrange(tree, queries, r). My data are SVector{D, Float64} (but I'm not sure whether this matters.

At the moment I need the distances of the found neighbors from the queries, even more than the indices. I've noticed that while the above call is spectacularly fast, the code I wrote to get the distances is veeeery slow:

function _NN_get_ds(tree::KDTree, query, idxs)
    if tree.reordered
        ds = [
            evaluate(tree.metric, query, tree.data[
                findfirst(isequal(i), tree.indices)
            ]) for i in idxs]
    else
        ds = [evaluate(tree.metric, query, tree.data[i]) for i in idxs]
    end
end

and now I transform my original code as

    vec_of_idxs = NearestNeighbors.inrange(tree, queries, r)
    vec_of_ds = [ _NN_get_ds(tree, queries[j], vec_of_idxs[j]) for j in 1:length(queries)]

I'm wonder, whether something already exists in this library, that provides these distances in a faster way?

@Datseris
Copy link
Author

Datseris commented Dec 3, 2020

PS: After doing some testing, the bottleneck is in fact the weird clause I've written if tree.reordered. I obviously shouldn't be using findfirst in such an inner loop... What is the correct way to translate tree.indices to data indices when the tree is reordered?

@KristofferC
Copy link
Owner

I guess you could create the inverse lookup (for all points) a single time and then reuse that.

@Datseris
Copy link
Author

Datseris commented Dec 3, 2020

I guess that makes sense. But how does the tree itself know the correct indices? I mean, for the "skip" predicate version of e.g. knn, the tree needs to somehow know the correct indices as well, no? Can't I obtain them the same way the tree obtains them when it checks to skip or not?

@KristofferC
Copy link
Owner

It just does this

idx = tree.reordered ? z : tree.indices[z]

@KristofferC
Copy link
Owner

I'm a little bit unclear what should be done here. For computing the distances you just loop through the points and compute the distance. Feel free to open a new issue with a bit more details if there is still an issue regarding this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants