This repository contains a Python program for performing hierarchical clustering using various clustering algorithms such as kmedioids, kmeans, and kmeanscpp. The program allows you to build, load, and search hierarchical clustering structures based on the specified algorithms and parameters.
This program implements a hierarchical clustering algorithm that supports multiple clustering algorithms, including kmedioids, kmeans, and kmeanscpp. It provides functionalities to build hierarchical clustering structures, load existing structures, and search for clusters based on input data.
- networkx
- matplotlib
- numpy
- argparse
- loguru
-
Clone this repository:
git clone https://github.com/DoYouEvenStackSmash/data-analysis.git cd data-analysis/src/python-processing
-
Install the required dependencies using
pip
:pip install -r requirements.txt
The program supports three main operations: building hierarchical clustering, loading existing clustering, and searching for clusters. You can use the command-line interface to perform these operations.
To build hierarchical clustering with different clustering parameters, use the following command:
python3 clustering_driver.py build -i example_2_2.npy -k 3 -R 30 -C 45 -o output
Replace example_2_2.npy
with the path to your input data file. The -k
, -R
, and -C
flags allow you to specify the number of clusters, number of iterations, and cutoff value respectively. The -o
flag is optional and can be used to specify an output file to save the hierarchical clustering structure. This will produce 3 files: output_tree_hierarchy.json
, output_tree_data_list.npy
, and output_tree_node_vals.npy
.
To load an existing hierarchical clustering structure, use the following command:
python3 clustering_driver.py load -t existing_tree.json -G
Replace existing_tree.json
with the path to the generated JSON file containing the hierarchy. Included in the JSON file is a resources
field which includes the necessary support files to build the tree. The -G flat builds the tree as an adjacency list and serializes it as tree_representation.graphml
To search for clusters in an existing hierarchical clustering structure, use the following command:
python3 clustering_driver.py search -t existing_tree.json -M exampleM_2_2.npy -G
Replace existing_tree.json
with the path to the JSON file containing the hierarchy and exampleM_2_2.npy
with the path to the large input data file. The -G
flag is optional and generates a graph from the tree data.
To compute the likelihoods of reference data given input data, use the following command:
python3 clustering_driver.py likelihood -t existing_tree.json -M some_input.npy -G
This will perform several variants of the likelihood evaluation, and create csv files containing the results of the different computations.
Contributions are welcome! If you have any ideas or improvements, please feel free to open issues or pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.