Check our data story!
We studied [Niculae et al., 2015] and it enabled us to highlight certain clues of language linked to the imminence of betrayal. We would like to apply similar techniques to detect betrayal in social deduction games, like Town of Salem, Secret Hitler, Among Us or Werewolf/Mafia. Is it possible, by studying the public exchanges of the players during a textual game, to spot the "traitor"? The major difference with the basic article is that we are not looking for a betrayal to come - the breaking of a friendship - but a betrayal that has already taken place - for instance, the "wolf" seeks to win by posing as a "villager". As such, we are going to analyse textual exchanges of different games, and try to apply the same methods to multiple sessions.
- Is it possible to apply the techniques seen in the article to other games?
- Is it possible to identify betrayal on a "long term" basis, i.e. deception (vs. in the article on a specific point in time)?
- What are the clues to identify "traitors" in social deduction games?
For the project, we found the following datasets that could be of interest for us:
- Linguistic Harbingers of Betrayal ([Niculae et al., 2015]): Dataset direct link.
- Town of Salem (Town of Salem): Dataset direct link.
- The Mafiascum Dataset ([de Ruiter et al., 2018]): Dataset direct link.
- Werewolf for Telegram (Werewolf/Mafia): Custom datasets from the famous Telegram bot version of the game.
Town of Salem's dataset lists 8'833 ranked games scraped from the "Trial System" of the online plateform of the game. The dataset is organized as a JSon file where each game is represented as a python dictionnary with four keys:
players
: General information about each of the 15 players of the game such as their account and pseudonym as well as information dealing with their involvement in the game (role, faction, time of their ingame death, ...).entries
: Chat logs displayed on the interface which can be messages from the players (eg. "i think we have an sk") catalogued into python dictionnaries (with keystype
containing different channels,author
,text
, andtime
) or moderating informations such as the ingame time or trials outcome. Note that the minimum number of entries for a game is 89, maximum is 1'539, with mean 526 and median 509.ranked
: Boolean,True
for each listed game.reportId
: Unique identifier of the "Trial System" characterizing a game.
The Mafiascum Dataset is a collection of over 700 games of Mafia played on an Internet forum. The interactions between players are scraped from this plateform. The data repository consists of several JSon files that contain different informations about each game and each player and messages, and are divided as follow:
- Files with suffix
games
includes general identifier of the games, e.g.id
,title
,moderator
, or number of posts. - Files with suffix
slots
contains informations about the players such as the id of the game in which they took part, their role, and how their game ended (eg. "lynched Day 2", "died Night 5" or "survives"). - Files without specific suffix gather all textual interactions, their authors as well as the id of the games, and the index of the post within the game (random examples of messages: "i don't wanna be a chicken i don't wanna be a duck", "Also, Egg - I don't particularly find the peacemaker routine a town-thing generally.").
Finally Werewolf for Telegram's dataset is a raw set of text messages exchanged on Telegram directly scraped from online groups. It would represent an important workload in terms of cleaning and shaping the data, therefore we don't plan to focus on this one. However, if time allows or for further investigations, it could be a great substrate to work on.
To sum up, we plan to use the initial database (Linguistic Harbingers of Betrayal) in order to develop and test the algorithm supposed to analyze different features of futher datasets.
Moreover, among the three new deduction games databases, we think that "The Mafiascum Dataset" is the one that will have the best chance to reproduce a successful result. It is cleaner and has already been used in another study.
The methodology will be the same than in [Niculae et al., 2015] parts 4.2 and 4.3.
- Preprocessing: Manual study of each datasets.
- Sentiment: Sentiment quantification using the Stanford Sentiment Analyzer ([Socher et al., 2013]).
- Planning:
- Explicit discourse connectors per sentence measurment ([Prasad et al., 2008]).
- Average number of claim and premise markers per sentence calculation ([Stab et al., 2014]).
- Number of request sentences in each message measurment using the heuristics in the Stanford Politeness classifier ([Danescu-Niculescu-Mizil et al., 2013]).
- Politeness: Politeness measurment of each message using the Stanford Politeness classifier (ibidem) - using CovoKit toolkit.
- Talkativeness: Number of messages sent, average number of sentences per message, average number of words per sentence.
- Model: Logistic regression for classification, "traitor" vs. "innocent".
For this project, we would like to focus on politeness and talkativeness. Because of the short time we have at our disposal, it would not be possible to analyze all the "harbingers".
- Week 0: Preliminary considerations
- 09.11: Release of P3 instructions
- Week 1: Thoughts on what we want to do.
- Week 2: Let's do this.
- 27.11: Deadline. P3 is due.
- Week 3: Implementation of the different methods and tests on the initial dataset.
- Week 4: Exploration on the others datasets.
- Week 5: Wrapping up.
- 18.12: Deadline. P4 is due.
We have four "harbingers" to study, namely talkativeness, politeness, argumentation and sentiment. Because of the time we will try to focus on politeness and talkativeness, but we will try to work on the other "harbingers" as well.
- Reproduction: Create a notebook with a reproduction of the aformentionned methods on the original dataset.
- (Study the impact of "positive sentiment" and see if we get the same results.)
- (Study the impact of "planning discourse markers" and see if we get the same results.)
- Study the impact of "politeness" and see if we get the same results.
- Study the impact of "talkativeness" and see if we get the same results.
- Abstraction: "Abstract"/"Factorize" the elements of the notebook to create an API applicable to other games.
- Dataset in, results out.
- Compare results from each others.
- Application: For each database (depending on the time):
- Sanitize and normalize the dataset to be used with our API.
- Apply the methods using our API.
- Compare results with the original paper.
- Noé and Emmanuel did the data concierging as well as the analysis of talkativeness.
- Hugo and Neil handled politeness
- Noé did an API to plot all the data (that was a real time saver for the others)
- Hugo handled the data story
- Hugo wrote the pitch for the video
- Neil performed in the video
- Of course, everyone reviewed every part he did not do and contributed, it's a team work !
- [Danescu-Niculescu-Mizil et al., 2013]: Cristian Danescu-Niculescu-Mizil, Moritz Sudhof, Dan Jurafsky, Jure Leskovec, Christopher Potts, A computational approach to politeness with application to social factors, ACL, 2013. Original paper.
- [Niculae et al., 2015]: Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber, Cristian Danescu-Niculescu-Mizil, Linguistic Harbingers of Betrayal: A Case Study on an Online Strategy Game, Proceedings of ACL, 2015. Original paper, website.
- [Prasad et al., 2008]: Rashmi Prasad, Nikhil Dinesh, Alan Lee, Eleni Miltsakaki, Livio Robaldo, Aravind Joshi, Bonnie Webber, The Penn Discourse TreeBank 2.0, LREC, 2008. Original paper.
- [de Ruiter et al., 2018]: Bob de Ruiter, George Kachergis, The Mafiascum Dataset: A Large Text Corpus for Deception Detection, 2018. Original paper.
- [Socher et al., 2013]: Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng, Christopher Potts, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank, EMNLP, 2013. Original paper, website.
- [Stab et al., 2014]: Christian Stab, Iryna Gurevych, Identifying Argumentative Discourse Structures in Persuasive Essay, EMNLP, 2014. Original paper.
- Town of Salem: Official website, Wikipedia, Fandom.
- Secret Hitler: Official website, Wikipedia.
- Among Us: Official website, Wikipedia.
- Werewolf/Mafia: Wikipedia, Telegram game official website.
- ConvKit: Cornell Conversational Analysis Toolkit. Website.
- Stanford NLP Group: Website.