Skip to content

Clustering algorithms (TI-)NBC implementation in Cython

License

Notifications You must be signed in to change notification settings

mklimasz/TI-NBC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

(TI-)NBC

NBC [1] and TI-NBC [2] implementations for Data Mining course @ WUT.

Installation

Install

make install

Run (cmd)

nbc --path "comma_separated_dataset.csv"

Run (python)

from nbc import clustering
import numpy as np

vectors = np.array([
    [0.0, 0.0],
    [1.0, 1.0],
    [2.0, 2.0],
    [10.0, 10.0],
    [11.0, 11.0]
])
k = 1
clusters = clustering.nbc(vectors=vectors, k=k)

Output - dictionary (vector id, cluster id - where -1 stands for a noise):

{0: 0, 1: 0, 2: 0, 3: 1, 4: 1}

Help / flags

  --k: Nearest neighbours count.
    (default: '5')
    (an integer)
  -o,--output_path: Output path for csv with clusters.
    (default: 'clusters.csv')
  -p,--path: Path to dataset as comma separated csv.
  -rp,--reference_point: Reference point if using TI - by default list of minimums.
    (a comma separated list)
  -ti,--[no]use_ti: Whether to use NBC with a Triangle Inequality (TI)
    (default: 'false')

To see all flags:

nbc --helpfull

Docker

Building docker image

docker build -t nbc:latest .

Example run script (assuming input and output should be in "data" directory)

bash run_docker.sh

Unit tests

make test

NBC algorithm

TI-NBC algorithm

References

[1] Zhou S., Zhao Y., Guan J., Huang J. (2005) A Neighborhood-Based Clustering Algorithm. In: Ho T.B., Cheung D., Liu H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science, vol 3518. Springer, Berlin, Heidelberg

[2] Kryszkiewicz M., Lasek P. (2010) A Neighborhood-Based Clustering by Means of the Triangle Inequality. In: Fyfe C., Tino P., Charles D., Garcia-Osorio C., Yin H. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2010. IDEAL 2010. Lecture Notes in Computer Science, vol 6283. Springer, Berlin, Heidelberg

Releases

No releases published

Packages

No packages published

Languages