NBC [1] and TI-NBC [2] implementations for Data Mining course @ WUT.
make install
nbc --path "comma_separated_dataset.csv"
from nbc import clustering
import numpy as np
vectors = np.array([
[0.0, 0.0],
[1.0, 1.0],
[2.0, 2.0],
[10.0, 10.0],
[11.0, 11.0]
])
k = 1
clusters = clustering.nbc(vectors=vectors, k=k)
Output - dictionary (vector id, cluster id - where -1 stands for a noise):
{0: 0, 1: 0, 2: 0, 3: 1, 4: 1}
--k: Nearest neighbours count.
(default: '5')
(an integer)
-o,--output_path: Output path for csv with clusters.
(default: 'clusters.csv')
-p,--path: Path to dataset as comma separated csv.
-rp,--reference_point: Reference point if using TI - by default list of minimums.
(a comma separated list)
-ti,--[no]use_ti: Whether to use NBC with a Triangle Inequality (TI)
(default: 'false')
To see all flags:
nbc --helpfull
Building docker image
docker build -t nbc:latest .
Example run script (assuming input and output should be in "data" directory)
bash run_docker.sh
make test