This repository consists of the following anlaysis and applications of Natural Language Processing (NLP) techniques on several books and their reviews. The books were procured in an image format in pdf, and later were connvereted using OCR to textual information. The book reviews were scraped from the internet:
-
Topic Modelling: Used Latent Dirichlet Allocation (LDA) for topic modelling on a dataset of books to indetify 5 topics present in each book. Before performing the topic modelling, cleaned the textual data by removing links, special characters, stop words and followed it with Lemmatization.
-
Web Scraping & Sentiment Analysis: Used BeautifulSoup to scrape book reviews from Goodread and Librarything. Then performed the VADER sentiment analysis, TextBlob sentiment analysis, and the NRC sentiment analysis to understand the emotions, polarity and the subjectivity of the reviews.
-
Word Cloud: Created Word Clouds on the above data using the WordCloud library
Vishakha Bhattacharjee
MS in Business Analytics, Columbia University