Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nn and knn do not accept points as arrays of arrays #121

Closed
BoZenKhaa opened this issue Mar 12, 2021 · 7 comments
Closed

nn and knn do not accept points as arrays of arrays #121

BoZenKhaa opened this issue Mar 12, 2021 · 7 comments

Comments

@BoZenKhaa
Copy link

When passing points as array of arrays to the knn and nn functions:

using NearestNeighbors
data = rand(3, 10^4)
k = 3

kdtree = KDTree(data)

# Multiple points as array of arrays
points = [rand(3) for i in 1:4];

idxs, dists = knn(kdtree, points, k, true);

I get following error:

julia> idxs, dists = knn(kdtree, points, k, true);
ERROR: MethodError: no method matching length(::Type{Array{Float64,1}})
Closest candidates are:
  length(::Base.EnvDict) at env.jl:132
  length(::LibGit2.GitStatus) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/LibGit2/src/status.jl:21
  length(::Base.AsyncGenerator) at asyncmap.jl:410
  ...
Stacktrace:
 [1] check_input(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}) at /home/user/.julia/packages/NearestNeighbors/jDOe3/src/NearestNeighbors.jl:27
 [2] knn(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}, ::Int64, ::Bool, ::Function) at /home/user/.julia/packages/NearestNeighbors/jDOe3/src/knn.jl:18 (repeats 2 times)
 [3] top-level scope at REPL[139]:1

I understood the line "points can also be a vector of other vectors where each element in the outer vector is considered a point." in README to mean that this should work?

Anyway, thanks for a neat implementation of kdtrees!

@BoZenKhaa
Copy link
Author

BoZenKhaa commented Mar 12, 2021

Looking into the code, the check that throws the error is this:

function check_input(::NNTree{V1}, ::AbstractVector{V2}) where {V1, V2 <: AbstractVector}
    if length(V1) != length(V2)
        throw(ArgumentError(
            "dimension of input points:$(length(V2)) and tree data:$(length(V1)) must agree"))
    end
end

where the length of type V2 can't be determined. I suppose a length of a type should return length of the instances of said type?

@KristofferC
Copy link
Owner

The README says:

vector of vectors with fixed dimensionality, nd, which must be part of the type. Specifically, data should be a Vector{V}, where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) are defined.

In your case eltype(V) and length(V) is not defined. You can use an array from StaticArrays:

using NearestNeighbors, StaticArrays

data = rand(3, 10^4)
k = 3

kdtree = KDTree(data)

# Multiple points as array of arrays
points = [rand(SVector{3}) for i in 1:4];

idxs, dists = knn(kdtree, points, k, true)

@BoZenKhaa
Copy link
Author

BoZenKhaa commented Mar 12, 2021

Thank you for a quick reply and for pointing that out to me, I completely missed that.

Maybe it could be worthwhile to mention in the README bullet that discusses points that same type restrictions that apply to points also apply to data? I could make the PR.

Alternatively, could having a more general check work as well? E.g. having

function check_input(::NNTree{V1}, points::AbstractVector{V2}) where {V1, V2 <: AbstractVector}
    dim=0
    try
        dim = length(V2)
    catch MethodError
        points_dims = unique([length(p) for p in points])
        if length(points_dims)==1
            dim = points_dims[1]
        else
            throw(ArgumentError(
                "all input points must have the same dimension: point dimensions: $points_dims and tree data:$(length(V1)) must agree"))
        end
    end

    if length(V1) != dim
        throw(ArgumentError(
            "dimension of input points:$(length(V2)) and tree data:$(length(V1)) must agree"))
    end
end

seems to work for my usecase and passes the tests, but I am not sure whether that doesn't affect the performance or potentially introduces some hidden issues...

Reason I am looking at this is that I would like to use NearestNeighbors inside a package where I set up the data and it's up to users to provide the points. I think I would prefer not to limit users to StaticArrays or repackage the user provided points into Matrix or StaticArrays if there wouldn't be a performance hit...

Thank you!

@lrsantos11
Copy link

I really think that @BoZenKhaa solution is the better. I really don't want to rely on StaticArrays because of performance issues for large and sparse arrays.

@davnn
Copy link
Contributor

davnn commented Jan 21, 2022

I really think that @BoZenKhaa solution is the better. I really don't want to rely on StaticArrays because of performance issues for large and sparse arrays.

Alternatively, you could simply use StaticArrays.jl SizedVector to annotate your vectors with their size. Example to transform a matrix to a vec of points:

points = [SizedVector{length(v)}(v) for v in eachcol(mat)]

@BoZenKhaa
Copy link
Author

Thanks for the tip @davnn, I haven't realized StaticArrays.jl provided this option. That could also be implemented in my package and the end users won't have to worry about using StaticArrays.

Anyway, it seems to me that at that point I am just working around a constraint imposed by NearestNeighbors.jl that may be too strict. Wouldn't relaxing this constraint be a better course of action? I can't tell whether there are some bad consequences to relaxing this constraint, maybe somewhere else in the package. As I said, relaxing the constraint passed the tests on my machine. Can you tell?

@KristofferC
Copy link
Owner

Dup of #85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants