BEVERS - A Simple Pipeline for Fact Verification

Image courtesy of Stable Diffusion 2.1

This repo is the code for Baseline fact Extraction and VERification System (BEVERS). The pipeline utilizes standard approaches for each component in the pipeline. Despite it's simplicity, BEVERS achieves SOTA performance on FEVER (old leaderboard, new leaderboard) and achieves the highest label accuracy F1 score on SciFact (leaderboard).

Requirements

conda

Installation

To create the bevers conda environment, Python requirements, and other requirements run the setup.sh script. The script requires sudo access for setting up SQLite as a fuzzy string search engine.

bash setup.sh

Running

There is a run script for FEVER, SciFact, and PubMed. The general pipeline of BEVERS is as follows (PubMed is an exeception):

TF-IDF setup (and fuzzy string search for FEVER)
Sentence selection dataset generation, model training, and final dumping of sentence scores.
Claim classification training and dumping of claim scores.
Training of XGBoost classifier
Generating final output files for submission to leaderboards for scoring.

# Run FEVER 
bash run_fever.sh
# Run PubMed (NOTE: manual effort is needed here to download required dataset files)
bash run_pubmed.sh
# Run SciFact (running PubMed is a prerequisite here)
bash run_scifact.sh

FEVER Results

System	Test Label Accuracy	Test FEVER Score
LisT5	79.35	75.87
Stammbach	79.16	76.78
ProoFVer	79.47	76.82
Ours (RoBERTa Large MNLI) mixed	79.39	76.89
Ours (DeBERTa v2 XL MNLI) mixed	80.24	77.70

SciFact Results

System	SS + L F1	Abstract Label Only F1
VerT5erini	58.8	64.9
ASRJoint	63.1	68.1
MultiVers	67.2	72.5
Ours	58.1	73.2

Todos

Release models - Done via Docker image (03/05/23)
Finish cleaning up code (started on this but didn't finish)
Update demo - Done (03/02/23)
Some of the code was simply copied from the evaluation repos for ease of use. Properly document source of code that is not mine.

Potential Todos

Improve retrieval for SciFact utilizing neural re-rankers like most other systems do.
Release easy to use predictions for sentence selection. This helps people who only want to focus on the claim classification portion of task. - Done via Docker image (03/05/23)

Regression Tests

In my initial code clean up I changed a decent amount of code and prior to release I wanted to make sure the results were replicable, so I ran regression tests for FEVER and SciFact.

FEVER

Run	Test Label Accuracy	Test FEVER Score
Published (RoBERTa Large MNLI)	79.39	76.89
Regression (02/20/23)	79.31	76.91
Published (DeBERTa v2 XL MNLI)	80.24	77.70
Regression (02/22/23)	80.35	77.86

SciFact

Run	SS + L F1	Abstract Label Only F1
Published	58.1	73.2
Regression (02/26/23)	58.3	73.8

Docker Images

As an means of distributing the system, BEVERS is made available as a Docker image

BEVERS: docker pull mitchelldehaven/bevers
BEVERS frontend: docker pull mitchelldehaven/bevers_frontend

For running the demo, the following must be done for the backend Flask API:

docker run -p 5000:5000 -it --gpus all mitchelldehaven/bevers
conda activate bevers
export DATASET=fever
export PYTHONPATH=.
export FLASK_APP=demo/backend/app.py
flask run --host=0.0.0.0

For running the demo, the follwing must be done for the frontend Angular UI:

docker run -p 4200:4200 -it mitchelldehaven/bevers_frontend

After both docker images are running, the demo is accessible by visiting http://localhost:4200/.

Demo

There is a simple UI for demoing the model. The current setup is a lighter version of what was used in for the best results for reducing compute requriement.

Backend

For running the backend Flask API:

export DATASET=fever
export PYTHONPATH=.
python demo/src/app.py

Frontend

cd demo/frontend
npm i
ng serve

There is a simple gif showing the demo below in order to avoid having to setup the demo to see what it does.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
demo		demo
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
bevers.jpeg		bevers.jpeg
fever_demo.gif		fever_demo.gif
requirements.txt		requirements.txt
run_fever.sh		run_fever.sh
run_pubmed.sh		run_pubmed.sh
run_scifact.sh		run_scifact.sh
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BEVERS - A Simple Pipeline for Fact Verification

Requirements

Installation

Running

FEVER Results

SciFact Results

Todos

Potential Todos

Regression Tests

FEVER

SciFact

Docker Images

Demo

Backend

Frontend

About

Releases

Packages

Languages

License

mitchelldehaven/bevers

Folders and files

Latest commit

History

Repository files navigation

BEVERS - A Simple Pipeline for Fact Verification

Requirements

Installation

Running

FEVER Results

SciFact Results

Todos

Potential Todos

Regression Tests

FEVER

SciFact

Docker Images

Demo

Backend

Frontend

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages