A dashboard for analysing entities (people and organisations) in the content of recent news articles + media coverage by user sentiment.
It uses the LatestNews API for getting news content. You can find the deployed website here!
- Named entities (person or organisation) can be filtered by number to obtain graphs and wordclouds.
- Selecting a particular entity finds other trending topics in the news related to it + groups the various news sources covering the topic to find the intensity of their sentiments using the Afinn Lexicon.
- The
sentiment.py
file creates the dashboard by loading the relevant files. It uses NLTK'S Parts of Speech tagger to chunk the tokens based on their POS tags to find named entities. - The
ner.ipynb
file contains previous attempts to solve the problem by using scikit-learn's CountVectorizer. - The dashboard currently only works on a small subset of data for the purpose of this experiment. Expanding it to cover news articles daily remains in the future scope.
nltk:
POS tagging to perform NER by chunking tokens.pandas:
formatting and cleaning the data.afinn:
lexicon to measure coverage sentiment.plotly express:
visualisations.streamlit:
web framework.
The live project is deployed on https://newsentity.herokuapp.com/.
You must have Python 3.6 or higher to run the file.
- Create a new virtual environment for running the application. You can follow the instructions here.
- Navigate to the virtual environment and activate it.
- Install the dependancies using
pip install -r requirements.txt
- Run the
news.py
file withstreamlit run news.py