*runnable
Individual school assignment. Stduent are asked to do data mining on any data.
I propose to do agglomerative clustering on the web log data. Implement 2 similarity functions, Jaccard similarity and Ratcliff/Obershelp algorithm.
training data: anonymous-msweb.data.gz
testing data: anonymous-msweb.test.gz
Study data: StudyData.ipynb
Create a file storing similarity matrix: DistanceMatrixJaccard.py, DistanceMatrixRatcliffObershelp.py
Perform agglomerative clustering: AgglomerativeClusteringJaccard.ipynb, AgglomerativeClusteringRatcliffObershelp.ipynb\