-
Notifications
You must be signed in to change notification settings - Fork 509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new argument for limiting the maximum epsilon #529
Conversation
Sorry for taking so long to get to this. It looks like a useful addition. Any chance you could add a test to the test suite to check that it works as intended? |
I totally missed your comment:s I'll do that yes. |
# Conflicts: # hdbscan/hdbscan_.py
… cluster_label_map
Hi @lmcinnes! :D It has been a while since the last update in this PR. Could you take another look? Thanks! |
Gentle reminder to revisit this PR @lmcinnes |
Thanks. |
This PR aims to introduce to HDBSCAN an argument for a max threshold to the epsilon used when picking the best clusters. With this PR we allow for this new argument,
cluster_selection_epsilon_max
, to be used in the EOM search method.This is very useful for cases where you know from the get go that your samples should not be very far from each other, because you have some domain knowledge.
For this implementation, we use
cluster_selection_epsilon_max
in a very similar way tomax_cluster_size
. This way the clusters with an epsilon bigger thancluster_selection_epsilon_max
can still appear if there are no valid clusters bellow that epsilon. This is, in fact, the exact same behavior asmax_cluster_size
.