Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on clustering_DPA and its pure python version #70

Closed
alexdepremia opened this issue Jul 12, 2022 · 9 comments
Closed

Bug on clustering_DPA and its pure python version #70

alexdepremia opened this issue Jul 12, 2022 · 9 comments
Assignees

Comments

@alexdepremia
Copy link
Collaborator

Subject of the issue

This issue appears mainly when using small maxk values. When looking for the nearest element with a higher g, it may happen that all the elements in the self.dist_indices vector are still not assigned. This is a consequence of the last step while finding the centers, when we remove centers from list if they are neighbors of higher density points.
I think that I know how to solve the issue, so I will do it when I have time.

Your environment

  • OS: [e.g. iOS]
  • Python version: [e.g. 3.7.2]
  • Package Version [e.g. 0.1.2]
  • Anything else you consider helpful.

Steps to reproduce

Tell us how to reproduce this issue.

Expected behaviour

Tell us what should happen

Actual behaviour

Tell us what happens instead

@alexdepremia alexdepremia self-assigned this Jul 12, 2022
@diegodoimo
Copy link
Collaborator

diegodoimo commented Jul 15, 2022

To reproduce the bug:
download (and unzip) the data at:
https://figshare.com/ndownloader/articles/20317350/versions/1

import numpy as np
from dadapy import data

#Assuming the downloaded data are stored in 'your_download_folder' load it with:
dist = np.load(f'{your_download_folder}/bug_adpy_dists.npy')
index = np.load(f'{your_download_folder}/bug_adpy_index.npy')

d = data.Data(distances=(dist, index), maxk = 50, verbose = False)
d.compute_id_2NN(fraction = 0.95)
d.compute_density_PAk()
d.compute_clustering_ADP(Z =1.65, halo =False)

Python: 3.8.2
Dadapy: 0.1.0
IOS: Ubuntu 20.04.1 LTS

@diegodoimo
Copy link
Collaborator

diegodoimo commented Jul 15, 2022

I found the same bug on the data that I posted in order to reproduce it.
The data are distances and distance indices (numpy matrices) of an hidden layer representation of ImageNet.

@alexdepremia
Copy link
Collaborator Author

@diegodoimo, could you add the line loading the data?

@diegodoimo
Copy link
Collaborator

diegodoimo commented Jul 15, 2022

I'm acually working on it. I'll post a possibile solution asap. I added the lines to load it above
.

@alexdepremia
Copy link
Collaborator Author

@diegodoimo, there are two possible solutions: One, that it would be the coherent with the paper description, is that when the nearest element with higher g is not in the neighbor list, compute the distances to all the elements with higher g and then take the minimum (but for this you would need the coordinates, not only the neighbor distances). The other option would be to ignore the restriction that you cannot have a center that is a neighbor of a point with higher g (this is a patch that would generate some inconsistencies but would allow to use only distances and indexes as input).

@diegodoimo
Copy link
Collaborator

diegodoimo commented Jul 15, 2022

I posted an attempt of solution at:
#71

I don't think we should use the coordinates, as the entire class would not work if one gives as input just the distances.
I assigned a point that does not have a neighbor already labeled, to the closest center c* of higher density that has been assigned.

alexdepremia added a commit that referenced this issue Jul 18, 2022
solution issue #70 in ADP cython and pure_python version
@AldoGl
Copy link
Collaborator

AldoGl commented Jul 19, 2022

Was this issue solved via #71 ?

@diegodoimo
Copy link
Collaborator

Yes

@alexdepremia
Copy link
Collaborator Author

I close this issue since it has been solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants