The SemEval-2016 dataset was downloaded using their recommended python script for Natural Language Processing (NLP) task of Sentiment Analysis. The instructions can be downloaded from https://github.com/aritter/twitter_download. I had to comment line 36 in download_tweets_api.py because script was crashing.
#uid = fields[1]
The raw downloaded dataset stats are:
Environment | Task | Point | number of Tweets |
---|---|---|---|
DEV | A | Three | 2000 |
DEV_TEST | A | Three | 2000 |
TRAIN | A | Three | 6000 |
TEST | A | Three | 20,633 |
DEV | B,D | Two | 1325 |
DEV_TEST | B,D | Two | 1417 |
TRAIN | B,D | Two | 4346 |
TEST | B,D | Two | 10,552 |
DEV | C,E | Five | 2000 |
DEV_TEST | C,E | Five | 2000 |
TRAIN | C,E | Five | 6000 |
TEST | C,E | Five | 20,632 |