using exact similarity search #5

psteinb · 2020-11-27T16:19:24Z

Thank you for providing the coding and implementation for density and coverage. It is awesome to have the code ready for use in practice. I cite the paper whenever possible.

I took the liberty to research possible improvements. I found that using an exact similarity search as offered by faiss can speed up the calculation of density and coverage by a great deal.

Here are my results for num_real_samples = num_fake_samples = 1024, feature_dim = 12, nearest_k = 5:

--------------------------------------------------------------------------------------- benchmark: 4 tests ---------------------------------------------------------------------------------------
Name (time in ms)                 Min                 Max                Mean             StdDev              Median                IQR            Outliers      OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_bench_my_coverage        10.7225 (1.0)       53.4196 (1.09)      13.9665 (1.0)       8.7931 (1.0)       11.1186 (1.0)       1.4884 (1.0)           1;4  71.5998 (1.0)          24           1
test_bench_my_density         11.9193 (1.11)      49.0908 (1.0)       16.7892 (1.20)      9.0503 (1.03)      12.8918 (1.16)      3.9669 (2.67)          2;3  59.5619 (0.83)         18           1
test_bench_prdc_coverage     316.7985 (29.55)    400.5574 (8.16)     354.7417 (25.40)    31.7475 (3.61)     355.2325 (31.95)    43.6299 (29.31)         2;0   2.8190 (0.04)          5           1
test_bench_prdc_density      365.5958 (34.10)    400.6876 (8.16)     382.3611 (27.38)    12.6120 (1.43)     380.4541 (34.22)    12.9641 (8.71)          2;0   2.6153 (0.04)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I tagged, any algorithm using faiss with test_bench_my. Using a similarity tree approach, this line

 real_nearest_neighbour_distances = compute_nearest_neighbour_distances(
        real_features, nearest_k)

in the original code is accelerated big time due to the efficient lookup of samples with the tree structure.

As such a change would drag in a dependency to faiss, I am reluctant to send a PR to this repo. Let me know what you think!

The text was updated successfully, but these errors were encountered:

coallaoh · 2020-12-06T14:32:21Z

Thank you for your interest in the paper and the awesome suggestion!

As you worry, though, I would be careful with changes that would require heavier dependencies and significantly longer installation time (not sure how much longer though). Moreover, I think a good evaluation code should not depend too much on external libraries, for the sake of constancy.

I understand that the current naive implementation of NN search is not fully efficient, but do you think it is an important bottleneck in your (or any potential) application scenario?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using exact similarity search #5

using exact similarity search #5

psteinb commented Nov 27, 2020

coallaoh commented Dec 6, 2020

using exact similarity search #5

using exact similarity search #5

Comments

psteinb commented Nov 27, 2020

coallaoh commented Dec 6, 2020