Skip to content

aaronxie0000/twitter_coronavirus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coronavirus twitter analysis

Project as part of Data Structures course

  1. Worked with large scale datasets (~1TB Large; all tweets sent over the year with targeted tags)
  2. Worked with multilingual text
  3. Used the MapReduce divide-and-conquer paradigm to create parallel code

Brief Explanation of Results

The full set of findings is contained in the two analysis_ folder which breaks down tweets with the specified keyword by language and country. Some interesting findings that can be concluded from analyzing these data include:

What is the most common term for COVID-19?

There are so many names for the coronavirus. Some of these terms are included in the hashtags and it can be seen that among English speaking Twitter users Covid19 (with 617695 instances of hashtags at time of data collection) is the most popular while Corona (with 529764) and Coronavirus (with 422394) closely follow.

Country comparison

Looking at the most popular term for hashtags, covid19, it can be seen that US has produced the most tweets with this hashtag at 283149 tweets while India (country code IN) and United Kingdom (country code GB) follow behind with 88590 tweets and 88178 tweets. This can be an indication of which country has the most online discussion about the virus – though these numbers should probably be put into context by comparing to number of twitter users in each country.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.3%
  • Shell 10.7%