Skip to content

Latest commit

 

History

History
48 lines (31 loc) · 2.59 KB

README.md

File metadata and controls

48 lines (31 loc) · 2.59 KB

Covid Vaccine Tweets Study

This work is the final project for the Big Data Analysis (BDA) course of the Master in Artificial Intelligence (MAI) of the Universitat Politècnica de Catalunya (UPC), BarcelonaTECH.

In this project, we have collected almost 1M tweets talking about the Covid Vaccine, stored them in a MongoDB database and analyzed them to extract relevant information.

All this study is reported in the main file covid-vaccine-tweets-study.ipynb, a self-contained notebook that introduces the problem, presents the followed methodology, and shows the results in a graphical way.1

The notebook contains interactive maps that cannot be rendered using GitHub's visualizer. However, it can still be visualized online, without any additional requirement, using Jupyter's nbviewer at nbviewer/covid-vaccine-tweets-study.ipynb.

Dependencies

This project was developed using Python 3.7 and we used Jupyter to create the main notebook.

Additionally, we have used:

  • PyMongo. A MongoDB interface for Python.

  • Tweepy. A Twitter API interface for Python.

  • langid. A Language Identification tool. Used for the tweet language assessment.

  • TextBlob. An NLP library that uses ML techniques for Tokenization, POS tagging, Spelling correction, ... Used for Sentiment Analysis specifically.

  • flair. An NLP library that uses state-of-the-art transformer models for several NLP tasks. Used for Sentiment Analysis too.

  • Matplotlib. A visualization library for Python. Used to create the plots.

  • Folium. A visualization library for Python. Used to generate interactive maps.

1 The gathered data is not available anymore. Therefore, the notebook serves as a report and cannot be executed again. Yet, all the code is functional and could be used for a similar project.