Information Retrieval and Web Search course project at Concordia University - assigned by Dr. Sabine Bergler.
COMP479 Project (P4) which experiments with web crawling, web scraping, and indexing a collection of web documents. Subsequently, the indexed data is clustered using the k-means algorithm. Each resulting cluster is then assigned a sentiment score using AFINN - a script used for sentiment analysis.
For the original project outline (professor Dr. Sabine Bergler), click here.
Python>=3.8 is used as a programming language for this project due to its compatibility with natural language processing tasks, facilitated by the NLTK package.
- beautifulsoup4
- scipy
- afinn
- scikit-learn
- TfidfVectorizer
- KMeans
- reppy
- urllib3