The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domainspecific data. The propose of this dataset is presenting evaluation consisting of seven heterogeneous Twitter-specific classification tasks.
This dattaset consists of seven heterogenous tasks in Twitter, all framed as multi-class tweet classification. Each dataset presented in the same format and with fixed training, validation and test splits.- Francesco Barbieri - Jose Camacho-Collados
- Leonardo Neves - Luis Espinosa-Anke
- Emotion Recognition
- Emoji Prediction
- Irony Detection
- Hate Speech Detection
- Offensive Language Identification
- Sentiment Analysis
- Stance Detection