-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The neighborhood graph is not connected #53
Comments
Hey! Sorry it took a while to respond you. The problem is that the neighborhood strategy being used in Tapkee is dead simple: it finds the nearest neighbors for each point and then uses these connections exclusively. This means that if you add a point that is not a neighbor for any other point it gets 'dropped' out of graph (never reachable). To solve the problem it might make sense to allow increasing the number of connections if some points lack them to be connected. Other mitigation might be to notify what are the disconnected points so that you can drop them from the dataset. I am not aware what mitigation might be the best, though. |
Hi, Only now I have gone through this. Just in case, excuses for the years-long lead time. I am wondering what is the meaning of that a point is not a neighbor of any other point? I will start running the investigation together with one for |
All right, I think this little inquietude doesn't really matter: even if every point has at least one neighbor the overall neighborhood graph could of course have more than one connected component |
It is just a matter of a form of the distance matrix. Once it becomes a block matrix the whole thing stops working because eigendecomposition is not unique, I think. Basically, every connected component should be embedded separately. But it doesn't make a lot of sense since we would not have any idea how they are positioned against each other. So that's why I think it is better to either:
|
Ah, that's a good point, I didn't think about the separate embeddings.
This is what I thought earlier, it would the in the second point: The issue is that given a number of neighbors (k) k_0 by the user (or the default one, if unprovided) the neighbor graph can be, let's say, unconnected and it is expected the result won't be trustworthy then. |
lisitsyn#53 (comment) Obviously, this is not to merge.
I opened #83 with a simple fix to recompute the neighbors if the neighborhood graph is not connected. By the way, I didn't manage to reproduce with the data in the opening comment. The number of vectors I got was either 19 or 20 (with the extra one), different to the 21 in the screenshot. I used the swissroll5000 from the jmlr repo to test this. In that one the binary search went from the initial 10 to 2505, which worked well for the graph, but then stopped quickly due to out of memory usage in the eigendecomposition. Then, I updated it to just multiplying by 2. |
Hi there, I am currently using the tapkee executable as a pipeline in my project in python, I am creating a text file with all my values to reduce wich looks like this :
0.301961,0.376471,0.443137,0,0.992157,0 0.188235,0.235294,0.298039,0.847059,0.752941,0 0.192157,0.239216,0.294118,0.854902,0.72549,0 0.266667,0.282353,0.380392,0.901961,0.505882,0.8 0.215686,0.211765,0.215686,0.913725,0.498039,0.8 0.25098,0.262745,0.286275,0.917647,0.494118,0.8 0.309804,0.305882,0.333333,0.905882,0.419608,0.8 0.258824,0.321569,0.396078,0.831373,0.811765,0 0.239216,0.298039,0.376471,0.831373,0.807843,0 0.231373,0.290196,0.368627,0.819608,0.741176,0 0.290196,0.360784,0.443137,0.784314,0.780392,0 0.301961,0.376471,0.443137,0,0.992157,0 0.188235,0.235294,0.298039,0.847059,0.752941,0 0.192157,0.239216,0.294118,0.854902,0.72549,0 0.266667,0.282353,0.380392,0.901961,0.505882,0.8 0.215686,0.211765,0.215686,0.913725,0.498039,0.8 0.25098,0.262745,0.286275,0.917647,0.494118,0.8 0.309804,0.305882,0.333333,0.905882,0.419608,0.8 0.258824,0.321569,0.396078,0.831373,0.811765,0
This example works fine but if I add one more line like
0.1,0.2,0.3,0.4,0.5,0
I get the message "[warning] The neighborhood graph is not connected"
So I check for solution and the only help I could find is this message that you wrote in your documentation.
Then i tried to change the k values and it works when it's equal or superior to the number of rows divised by 2. So if I have 30 rows I have to make the neighbour values to 15 or higher.
Is it right or something else is wrong ? I am supposed to use this method with a lot of rows (4000 to
million), it gonna be long no ? I may not understand perfectly how it works.
Thanks for your help.
The text was updated successfully, but these errors were encountered: