Automatic extraction of data from clinical trial reports
RobotReviewer is a system for providing automatic annotations from clinical trials (in PDF format). Currently, RobotReviewer provides data on the trial PICO characteristics (Population, Interventions/Comparators, and Outcomes), and also automatically assesses trials for likely biases using the Cochrane Risk of Bias tool.
You can cite the current version as .
We offer RobotReviewer free of charge, but we'd be most grateful if you would cite us if you use it. We're academics, and thrive on links and citations! Getting RobotReviewer widely used and cited helps us obtain the funding to maintain the project and make RobotReviewer better.
It also makes your methods transparent to your readers, and not least we'd love to see where RobotReviewer is used! :)
For most people, we encourage you to use RobotReviewer via our website.
No need to install anything, simply upload your PDFs, and RobotReviewer will automatically extract key data and present a summary table.
For those who are particularly technically minded, or have a pressing need to run the software on their own machines, read on...
RobotReviewer is open source and free to use under the GPL license, version 3.0 (see the LICENSE.txt file in this directory).
We'd appreciate it if you would:
- Display the text, 'Risk of Bias automation by RobotReviewer (how to cite)' on the same screen or webpage on which the RobotReviewer results (highlighted text or risk of bias judgements) are displayed.
- For web-based tools, the text 'how to cite' should link to our website
http://vortext.systems/robotreviewer
- For desktop software, you should usually link to the same website. If this is not possible, you may alternately display the text and example citations from the 'How to cite RobotReviewer' section below.
You can cite RobotReviewer as:
Marshall IJ, Kuiper J, Banner E, Wallace BC. “Automating Biomedical Evidence Synthesis: RobotReviewer.” Proceedings of the Conference of the Association for Computational Linguistics (ACL). 2017 (July): 7–12.
A BibTeX entry for LaTeX users is:
@article{RobotReviewer2017, title = "Automating Biomedical Evidence Synthesis: {RobotReviewer}", author = "Marshall, Iain J and Kuiper, Jo{"e}l and Banner, Edward and Wallace, Byron C", journal = "Proceedings of the Conference of the Association for Computational Linguistics (ACL)", volume = 2017, pages = "7--12", month = jul, year = 2017, }
The project can be run as a set of Docker services using the docker-compose
command, which is usually the easiest way to install locally.
First you should clone this repository, and download/decompress the SciBERT model file.
git clone https://github.com/ijmarshall/robotreviewer.git
wget https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/tensorflow_models/scibert_scivocab_uncased.tar.gz
tar -zxf scibert_scivocab_uncased.tar.gz --directory robotreviewer/robotreviewer/data
Afterwards, create a config.json
file from config.json.example
. When running from docker-compose, the following configuration for running locally is enough:
{
"robotreviewer": {
"use_grobid": true,
"grobid_threads": 4,
"spacy_threads": 4,
"dont_delete": 0,
"log": "log.txt",
"api_keys": {
"a_secret_key": {
"uid": 1
}
}
}
}
Then, create an .env
file from the .envTemplate
file. Keep the ROBOTREVIEWER_GROBID_HOST
value if the
docker-compose files are not modified and th grobid
service is running on port 8070.
Then, to build, from within the code directory run:
docker-compose build
If the build is successful, you can then start the services locally - in detached mode - by running:
docker-compose up -d
You can then access the website on any browser on your local machine at: http://localhost:5050, while the API server will be available at: http://localhost:5051 (consider using Postman for testing the endpoints.)
To stop the websever, run:
docker-compose down --remove-orphans
The docker-compose file docker-compose.gpu.yml
is provided including the necessary settings for making the GPU visible to docker containers.
Before running the docker-compose command, it is necessary to install Nvidia Cuda drivers and nvidia-container-runtime
following the instructions from https://docs.docker.com/config/containers/resource_constraints/#gpu and https://docs.docker.com/compose/gpu-support/.
You can test that your GPU is visible within the docker container by running the following command:
docker run -it --rm --gpus all ubuntu nvidia-smi
To run RobotReviewer with GPU support, you must specify the GPU docker-compose file:
docker-compose -f docker-compose.gpu.yml build
docker-compose -f docker-compose.gpu.yml up -d
To stop the services running with GPU support, use:
docker-compose -f docker-compose.gpu.yml down --remove-orphans
The docker-compose.dev.yml
compose file can be used when the Flask development server is desired instead of Gunicorn.
To run in development mode, use the same commands as before, specifying the development compose file:
docker compose -f docker-compose.dev.yml build
docker compose -f docker-compose.dev.yml up
To stop the services running in development mode, use:
docker compose -f docker-compose.dev.yml down --remove-orphans
We have tested the installation on Ubuntu, and Mac OS which both work successfully with the following instructions. Windows does work in the end but with a lot of installation headaches!
-
Ensure you have a working version of Python 3.6. We strongly recommend using Python from the Anaconda Python distribution for a quicker and more reliable experience.
-
Install git-lfs for managing the model file versions (on Mac:
brew install git-lfs
). NB! If you already have git lfs installed, make sure it's the most recent version, since older versions have not downloaded files properly. -
Get a copy of the RobotReviewer repo, and go into that directory
git clone https://github.com/ijmarshall/robotreviewer.git cd robotreviewer
-
Install the Python libraries that RobotReviewer needs. The most reliable way is through a conda environment. The following downloads the packages, and installs the required data.
conda env create -f robotreviewer_env.yml conda activate robotreviewer python -m spacy download en python -m nltk.downloader punkt stopwords
You then should install tensorflow V 1.12.0, with or without GPU support depending on your preference:
conda activate robotreviewer
pip install tensorflow==1.12.0 # OR
pip install tensorflow-gpu==1.12.0
-
Ensure
keras
is set to usetensorflow
as its default backend. Steps on how to do this can be found here. -
This version of RobotReviewer requires Grobid, which in turn uses Java. Follow the instructions here to download and build it. This version of RobotReviewer has been tested with Grobid 0.5.1, but no longer works with 0.4 versions.
-
Create the
robotreviewer/config.json
file and ensure it contains the path to the directory where you have installed Grobid. (RobotReviewer will start it automatically in a subprocess). Note that this should be the path to the entire (parent) Grobid directory, not the bin subfolder. An example of this file is provided inrobotreviewer/config.json.example
(it is only necessary to change thegrobid_path
). -
Also install
rabbitmq
. This can be done via homebrew on OS X, or by alternative means documented here. Finally, install make sure celery is installed and on your path. Note that this ships with Anaconda by default and will be found in the$(anaconda-home)/bin/celery
dir by default. -
We now also make use of BERT embeddings, specifically SciBERT. For this we use the bert-as-service. This needs to be running locally.
To do this, get the SciBERT model file:
wget https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/tensorflow_models/scibert_scivocab_uncased.tar.gz
And (from the RobotReviewer base directory) decompress to the robotreviewer data folder:
tar -zxf scibert_scivocab_uncased.tar.gz --directory robotreviewer/data
RobotReviewer requires a 'worker' process (which does the Machine Learning), and a webserver to be started. Ensure that you are within the conda environment (default name: robotreviewer) when running the following processes.
First, be sure that rabbitmq-server is running. If you haven't set this to start on login, you can invoke manually:
rabbitmq-server
Then, to start the Machine Learning worker (using the GPU):
celery -A robotreviewer.ml_worker worker --loglevel=info --concurrency=1 --pool=solo
Alternatively, to start RobotReviewer using CPU only, use the following command:
env CUDA_VISIBLE_DEVICES=-1 celery -A robotreviewer.ml_worker worker --loglevel=info --concurrency=1 --pool=solo
Next, be sure that bert-as-a-service is running, and using the SciBERT weights:
bert-serving-start -model_dir=/Path/to/SciBERT-weights/
Finally, to start the webserver (on localhost:5000
):
python -m robotreviewer
NEW! To start the server for the Swagger API, run:
REST_API=true python -m robotreviewer --rest
We have included example reports, with open access RCT PDFs to demonstrate RobotReviewer. These are saved in the default database, and can be accessed via the following links.
Decision aids: http://localhost:5000/#report/Tvg0-pHV2QBsYpJxE2KW-
Influenza vaccination: http://localhost:5000/#report/_fzGUEvWAeRsqYSmNQbBq
Hypertension: http://localhost:5000/#report/HBkzX1I3Uz_kZEQYeqXJf
The big change in this version of RobotReviewer is that we now deal with groups of clinical trial reports, rather than one at a time. This is to allow RobotReviewer to synthesise the results of multiple trials.
As a consequence, the API has become more sophisticated than previously and we will add further documentation about it here.
In the meantime, the code for the API endpoints can be found in /robotreviewer/app.py
.
Some things remain simple; e.g., for an example of using RR to classify abstracts as RCTs (or not) see this gist.
If you are interested in incorporating RobotReviewer into your own software, please contact us and we'd be pleased to assist.
The following
python -m unittest
will run the testing modules. These should be used to assure that changes made do not break or have an affect on the core of the code. If Ran X tests in Ys
is displayed, the tests have completed successfully.
Feel free to contact us at mail@ijmarshall.com with any questions.
Most likely the problem is that your path to Grobid in robotreviewer/config.json
is incorrect. If your path uses a ~
, try using a path without one.
Often found on OS X. If you installed rabbitmq
using Homebrew, running the command brew services start rabbitmq
should work.
- Marshall, I. J., Kuiper, J., & Wallace, B. C. (2015). RobotReviewer: evaluation of a system for automatically assessing bias in clinical trials. Journal of the American Medical Informatics Association. [doi]
- Zhang Y, Marshall I. J., & Wallace, B. C. (2016) Rationale-Augmented Convolutional Neural Networks for Text Classification. Conference on Empirical Methods on Natural Language Processing. [preprint]
- Marshall, I., Kuiper, J., & Wallace, B. (2015). Automating Risk of Bias Assessment for Clinical Trials. IEEE Journal of Biomedical and Health Informatics. [doi]
- Kuiper, J., Marshall, I. J., Wallace, B. C., & Swertz, M. A. (2014). Spá: A Web-Based Viewer for Text Mining in Evidence Based Medicine. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2014) (Vol. 8726, pp. 452–455). Springer Berlin Heidelberg. [doi]
- Marshall, I. J., Kuiper, J., & Wallace, B. C. (2014). Automating Risk of Bias Assessment for Clinical Trials. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB) (pp. 88–95). ACM. [doi]
Copyright (c) 2018 Iain Marshall, Joël Kuiper, and Byron Wallace
We are enormously grateful to our many collaborators, whose work is incorporated in RobotReviewer. These include Ani Nenkova and Zachary Ives at UPenn, Benjamin Nye at Northeastern, James Thomas at the EPPI Centre, UCL, and Anna Noel-Storr at the University of Oxford and Cochrane Dementia group. !e would like to express our gratitude to the Cochrane Collaboration, and especially to David Tovey and Chris Mavergames among many others who facilitated getting access to data, and made many useful introductions. We are hugely appreciative to the volunteers of the Cochrane Crowd, and to Anna Noel-Storr and Gordon Dooley, whose efforts and data we depend on to build our machine learning systems for identifying RCTs.
We include an implimentation of the Schwartz-Hearst algorithm in Python by Vincent Van Asch and Phil Gooch, which is released under the MIT licence.
This work is supported by: National Institutes of Health (NIH) under the National Library of Medicine, grant R01-LM012086-01A1, "Semi-Automating Data Extraction for Systematic Reviews", and by NIH grant 5UH2CA203711-02, "Crowdsourcing Mark-up of the Medical Literature to Support Evidence-Based Medicine and Develop Automated Annotation Capabilities", and the UK Medical Research Council (MRC), through its Skills Development Fellowship program, grant MR/N015185/1