The availability of geographical information from social media data such as Twitter allows the identification of diversity in the spreading of information across one country. This document reports the analysis of hashtags from the collected geo-enabled tweets in a period of two weeks from the country of France to identify geographical characteristics of the collected hashtags. By using topic modelling methods to cluster the hashtags into meaningful topics, the geo-topical distribution of hashtags can be calculated for each France department, then the local and global characteristic of the topics can be identified. The analysis shows that hashtags clusters that contain geographically self-referential information and punctual events are largely local, and clusters that are related to entertainment and pets are mostly global. Interestingly, some clusters that are related to TV show and entertainment are found to have local preferences.
This repository hosts the code used to collect, clean, explore and analyze the data.
twitter_stream.ipynb
collect tweets thanks to Twitter Standard Streaming API
preprocessing.ipynb
after an anonymization process: NLP cleaning of the textual contents including tokenization, stopwords removal, POS tagging, ... (not further used due to technical difficulties), and geo-enrichment based on geo-tags provided by the APIHashtags_analysis.ipynb
coarse exploration of the hashtags contained in the tweets and first analyses
LDA.ipynb
LDA topic modelling on hashtags, including results and visualizationsLGLDA_tune.ipynb
tuning procedure for the hyper-parameters of the LGLDA modelLGLDA.ipynb
LGLDA topic modelling on hashtags, including results and visualizationsNetwork.ipynb
topic modelling through a graph theory approachLouvain_viz.ipynb
Louvain communities topic modelling on hashtags, including results and visualizations
The main results and visualizations are also provided in two folders.
Interactive html
figures are provided for deeper explorations.
data_filtered.html
map of the number of tweets for selected departments: preview herehashtags_count.html
map of the count of hashtags for selected depatments: preview herefinalMap.html
final visualization associating word-cloud to each of the studied departments: preview hereLouvain.html
hashtags network with nodes colored according to their Louvain community (topic): preview here
csv
files of the found topics can be found under:
df_lda_short.csv
topics found with the LDA, with the 10-topwordsdf_lglda_short.csv
topics found with the LGLDA, with the 10-topwordsdf_louvain.csv
topics found with the network approach (communities with at least 5 hashtags)