Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with points as vector of points? #85

Open
briochemc opened this issue Jun 13, 2019 · 11 comments
Open

error with points as vector of points? #85

briochemc opened this issue Jun 13, 2019 · 11 comments

Comments

@briochemc
Copy link
Contributor

Great package! So fast! 😃 Anyway, I thought this MWE (based on the readme example except for using points instead of point) would work (since "points can also be a vector of other vectors where each element in the outer vector is considered a point.") but it throws an error:

julia> using NearestNeighbors

julia> data = rand(3, 10^4);

julia> k = 3 ;

julia> kdtree = KDTree(data) ;

julia> points = [rand(3) for i in 1:10]
10-element Array{Array{Float64,1},1}:
 [0.68304, 0.983004, 0.220862]
 [0.626807, 0.61079, 0.968843]
 [0.192098, 0.157959, 0.300556]
 [0.300309, 0.502549, 0.653381]
 [0.160819, 0.653002, 0.9979]
 [0.809764, 0.0521815, 0.0290807]
 [0.20466, 0.592894, 0.656526]
 [0.370667, 0.0579244, 0.930079]
 [0.593999, 0.934687, 0.555133]
 [0.60504, 0.78062, 0.243998]

julia> idxs, dists = knn(kdtree, points, k, true)
ERROR: MethodError: no method matching length(::Type{Array{Float64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:561
  length(::Base.MethodList) at reflection.jl:801
  length(::Core.MethodTable) at reflection.jl:875
  ...
Stacktrace:
 [1] check_input(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/NearestNeighbors.jl:29
 [2] knn(::KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}, ::Array{Array{Float64,1},1}, ::Int64, ::Bool, ::Function) at /Users/benoitpasquier/.julia/packages/NearestNeighbors/N7lgR/src/knn.jl:16 (repeats 2 times)
 [3] top-level scope at none:0

Maybe I'm doing this wrong?

@dkarrasch
Copy link
Contributor

You could use StaticArrays

using StaticArrays, NearestNeighbors
data = rand(3, 10^4);
k = 3;
kdtree = KDTree(data);
points = [@SVector rand(3) for i in 1:10]
idxs, dists = knn(kdtree, points, k, true)

I think this should be reflected in the README, or otherwise there should be a silent attempt internally to turn a vector of vectors into a vector of static vectors. The overall reason for this requirement is that in a vector of vectors the elements may have different lengths, in contrast to points being a matrix, where each column must have the same length. Differing lengths, however, cannot be inferred from the AbstractVector type, and checking it would probably add to the runtime, I'm not sure, however, by how much.

@briochemc
Copy link
Contributor Author

Oh I see, that makes sense. (However, I have been using a 3xn array instead for large n.)

I have two questions then:

  • Is it more performant to use (and generate) a 2D Array or a Vector{SVector}?
  • What about the data used to generated the tree? I.e., would data = [@SVector rand(3) for i in 1:10^4] be better than data = rand(3, 10^4)?

@dkarrasch
Copy link
Contributor

@KristofferC may help out if I'm saying something wrong, as I haven't written or modified any parts of the code, just scanned quickly through it some time ago. If I am not mistaken, it is always better to have things in the vector of SVector format, because internally, that's what 2D arrays are converted to anyway.

function KDTree(data::Matrix{T},
metric::M = Euclidean();
leafsize::Int = 10,
storedata::Bool = true,
reorder::Bool = true,
reorderbuffer::Matrix{T} = Matrix{T}(undef, 0, 0)) where {T <: AbstractFloat, M <: MinkowskiMetric}
dim = size(data, 1)
npoints = size(data, 2)
points = copy_svec(T, data, Val(dim))
if isempty(reorderbuffer)
reorderbuffer_points = Vector{SVector{dim,T}}()
else
reorderbuffer_points = copy_svec(T, reorderbuffer, Val(dim))
end
KDTree(points, metric, leafsize = leafsize, storedata = storedata, reorder = reorder,
reorderbuffer = reorderbuffer_points)
end

The reason likely is that once you have things in SVector format, the length of vectors and hence the dimension of the space is known to the compiler, which may help performance. But you could check and benchmark, and report here. I'd be interested as well.

@KristofferC
Copy link
Owner

KristofferC commented Jun 14, 2019

The key point in the docs is:

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

So you should be able to use MVector as well which is mutable. Sure, we could convert an Array of Vectors and assert that all vectors have the same length.

@briochemc
Copy link
Contributor Author

Oh I see now. I got confused by the fact that eltype(V) and length(V) are not the same as eltype(v::V) and length(v::V)... May I suggest adding "(e.g., V = SVector{...})" and an example to the docs with SVectors?

@mkborregaard
Copy link

mkborregaard commented Oct 23, 2019

or it can be a Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V) is defined

But that should be true for a vector of vectors, right? Yet I get

julia> a = [rand(1:10, 2) for i in 1:5]
5-element Array{Array{Int64,1},1}:
 [1, 9]
 [6, 3]
 [2, 7]
 [7, 3]
 [9, 7]

julia> KDTree(a)
ERROR: MethodError: no method matching length(::Type{Array{Int64,1}})
Closest candidates are:
  length(::Core.SimpleVector) at essentials.jl:593
  length(::Base.MethodList) at reflection.jl:832
  length(::Core.MethodTable) at reflection.jl:906
  ...
Stacktrace:
 [1] NearestNeighbors.TreeData(::Array{Array{Int64,1},1}, ::Int64) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/tree_data.jl:14
 [2] #KDTree#16(::Int64, ::Bool, ::Bool, ::Array{Array{Int64,1},1}, ::Type{KDTree}, ::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:35
 [3] KDTree(::Array{Array{Int64,1},1}, ::Euclidean) at /Users/michael/.julia/packages/NearestNeighbors/N7lgR/src/kd_tree.jl:33 (repeats 2 times)
 [4] top-level scope at none:0

@KristofferC
Copy link
Owner

But that should be true for a vector of vectors, right?

No? The error message told you that it wasn't.

@mkborregaard
Copy link

Well.
In that case I don't understand what's meant by that part of the docs.

@mkborregaard
Copy link

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

@KristofferC
Copy link
Owner

KristofferC commented Oct 24, 2019

Vector{V} where V is itself a subtype of an AbstractVector and such that eltype(V) and length(V)

For a Vector{Vector{Float64}}, V is Vector{Float64} and length(V) (which is length(Vector{Float64})) is not defined and is what the error message complains about. If instead you had a Vector{SVector{3, Float64}} then V is SVector{3, Float64} and length(V) is 3.

Anyway this would be really useful as very few procedures I know of to generate collection of n-dimensional points return them as an n-by-x array.

A Vector{Vector{Float64}} is an inefficent way of storing a point cloud and users calling NearestNeighbors with such a data structure would have bad performance and blame the library. It is too general of a data structure since the points in such a data structure can have different dimensions (we now need to add extra code to do boundschecks) but the library requires that all points have the same dimension.

@mkborregaard
Copy link

Yeah OK I see, the length needs to be part of the type.

briochemc added a commit to briochemc/NearestNeighbors.jl that referenced this issue Oct 25, 2019
Add a little more explanation as suggested in KristofferC#85
KristofferC pushed a commit that referenced this issue Nov 19, 2019
…ta are valid (#90)

* Update README.md

Add a little more explanation as suggested in #85

* fix typo in readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants