Skip to content

tony0021074/Clustering-webLogData

Repository files navigation

Clustering Web Log data (School individual project)

*runnable

Individual school assignment. Stduent are asked to do data mining on any data.

I propose to do agglomerative clustering on the web log data. Implement 2 similarity functions, Jaccard similarity and Ratcliff/Obershelp algorithm. training data: anonymous-msweb.data.gz
testing data: anonymous-msweb.test.gz
Study data: StudyData.ipynb
Create a file storing similarity matrix: DistanceMatrixJaccard.py, DistanceMatrixRatcliffObershelp.py
Perform agglomerative clustering: AgglomerativeClusteringJaccard.ipynb, AgglomerativeClusteringRatcliffObershelp.ipynb\