annoy.py conversion of cosine distance to cosine similarity is incorrect #3440
Labels
bug
Issue described a bug
impact LOW
Low impact on affected users
reach LOW
Affects only niche use-case users
Milestone
in this function the code to calculate cosine similarity is incorrect
according to annoy documentation
get_nns_by_vector
withinclude_distances=True
will return the distances and not the square power of the distance (this was changed since aug 2016):a.get_distance(i, j)
returns the distance between items i and j. NOTE: this used to return the squared distance, but has been changed as of Aug 2016.link
also:
Annoy uses Euclidean distance of normalized vectors for its angular distance, which for two vectors u,v is equal to sqrt(2(1-cos(u,v)))
link
so this means that in order to calculate the cosine similarity correctly we should do this:
return [(self.labels[ids[i]], 1 - distances[i]^2 / 2) for i in range(len(ids))]
The text was updated successfully, but these errors were encountered: