This repository contains the code to convert the datasets into topic relevant collections of sentence level annotated documents as well as the code to run simulations of users who annotate one document after the other.
The subfolders contain the following content:
data
contains the ED-ACL-2014 and ED-EMNLP-2015 datasets and the to convert them, as well we the code to download and convert the Argument Mining dataset.
EvidenceDetection
contains the code to simulate the users and analyse the quality of the predictions
Bert
contains code to use Bert based on https://github.com/UKPLab/acl2019-BERT-argument-classification-and-clustering
This projects uses four datasets, ED-ACL-2014, ED-EMNLP-2015, ED-ACL-2018, and Argument Mining. In the used form, each dataset except ED-ACL-2018 consisted of Documents with sentence level annotations. Each sentence is either a piece of evidence or not. It uses the following TSV format.
Label | Sentence |
---|---|
Evidence | candidate sentence |
Copyright Wikipedia Copyright IBM 2014. Released under CC-BY-SA.
The original dataset consisted of a table of topic related pieces of evidence from Wikipedia articles and the associated articles. We converted it into collections of topic related Wikipedia articles with sentence level annotations of evidence. Each sentence in the article is either a piece of evidence or not.
Copyright Wikipedia Copyright IBM 2015. Released under CC-BY-SA.
The dataset was processed identically to the ED-ACL-2014 dataset.
Copyright Wikipedia Copyright IBM 2015. Released under CC-BY-SA.
This dataset is not included in this repository, but can be downloaded from http://www.research.ibm.com/haifa/dept/vst/debating_data.shtml
This dataset is based on the Sentential UKP Argument Mining corpus. We extended the source code to complete the dataset by saving the original file with sentence level annotations. We included the modified source code.
This project aims towards extracting evidences from documents based on the data collected in the hypothesis validation user surveys.
Currently, the project is focussed on sentence level classification as evidence, or not.
./ | |- bin (callable scripts) | |- hrl (Scripts to run the experiments on the Lichtenberg high performance computer) | |- scripts (contains a script for post processing to make the analysis easier) | |- evidencedetection (the library code) | |- analysis (scripts to analyse the predictions)
Before running the experiments, please install the evidence detection package through the setup.py script.
python3 setup.py (install|develop)
The bin
folder contains the runnable script to train the BiLSTM as well as the pre-trained and finetuning models.
Pre-training for the evidence detection and argument mining models is to be done by two separate scripts whereas both pre-trained models can be further fine tuned by a single script.
The results of the experiments are the raw predictions which will be saved next to the original test file.
The results follows the convention of $TEST_FILE_FOLDER/$MODEL_NAME/$SEED/$ITERATION/$TEST_FILENAME.pred
.
The file will use the same data format as the test file, i.e. $LABEL\t$SENTENCE
.
The $SEED
is the randomisation seed and the ITERATION
the number of files used for training.
The analysis
folder contains the code and scripts to read the test files as well as the prediction files and conduct different evaluations on them.
For instance, it is possible to evaluate individual models and plot the change in performance with the AnalyzeMotionThroughTime.py
script.
It is also possible to evaluate the final performance of the model alone.
However, this assumes that the final prediction files are stored in the randomisation seed specific folder and not in last iteration.
To copy the prediction files from the last iteration, we added a bash script in the scripts
folder.
To run the experiments please follow the instructions provided at https://github.com/UKPLab/acl2019-BERT-argument-classification-and-clustering
Evidence Detection
The train.sh
script trains the evidence detection model and the test.sh
script runs the evaluation.
Argument Mining
The trainAM.sh
script trains the argument mining model and the testAM.sh
scripts runs the evaluation.
@inproceedings{Stahlhut:InteractiveEvidenceDetection-2019, address = {{Hongkong, China}}, title = {Interactive {{Evidence Detection}}: Train State-of-the-Art Model out-of-Domain or Simple Model Interactively?}, abstract = {Finding evidence is of vital importance in research as well as fact checking and an evidence detection method would be useful in speeding up this process. However, when addressing a new topic there is no training data and there are two approaches to get started. One could use large amounts of out-of-domain data to train a state-of-the-art method, or to use the small data that a person creates while working on the topic. In this paper, we address this problem in two steps. First, by simulating users who read source documents and label sentences they can use as evidence, thereby creating small amounts of training data for an interactively trained evidence detection model; and second, by comparing such an interactively trained model against a pre-trained model that has been trained on large out-of-domain data. We found that an interactively trained model not only often out-performs a state-of-the-art model but also requires significantly lower amounts of computational resources. Therefore, especially when computational resources are scarce, e.g. no GPU available, training a smaller model on the fly is preferable to training a well generalising but resource hungry out-of-domain model.}, booktitle = {Proceedings of the {{Second Workshop}} on {{Fact Extraction}} and {{VERification}} ({{FEVER}})}, publisher = {{Association for Computational Linguistics}}, author = {Stahlhut, Chris}, month = nov, year = {2019}, }