Skip to content

BioCreative VI — Track 5: text mining chemical–protein interactions

License

Notifications You must be signed in to change notification settings

ruiantunes/biocreative-vi-track-5-chemprot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BioCreative VI — Track 5: text mining chemical-protein interactions (ChemProt)

This code presents our system for the ChemProt task.

Requirements

Ubuntu, Python 3.6.4. Install the required packages:

$ pip install -r requirements.txt

Usage

Scripts

confusion.py: Calculate the confusion matrix and other statistics given a file with predicted relations.

create_embeddings.py: Create pre-trained part-of-speech and dependency embedding vectors.

main.py: Train a deep learning model and test it. The deep learning model can be a bidirectional long short-term memory (BiLSTM) recurrent network or a convolutional neural network (CNN). It is necessary to edit the script to choose the different input arguments. Only the seed number can be passed by command line:

$ python main.py 2

mfuncs.py: Functions used by the main.py script.

support.py: Auxiliary code to treat the ChemProt dataset.

utils.py: General use utilities.

voting.py: Average several outputs (probabilities). Edit the script to choose the input directory and the group to be evaluated.

Datasets

The datasets were pre-processed (tokenization, sentence splitting, part-of-speech tagging, and dependency parsing) by the Turku Event Extraction System (TEES). Available for download as data.zip [Mirror 1] [Mirror 2]:

Word embeddings

Our word embedding models were created from PubMed English abstracts. We also pre-trained part-of-speech and dependency embedding vectors from the ChemProt dataset. Available for download as word2vec.zip [Mirror 1] [Mirror 2].

We also tested the word embeddings model created by Chen et al. (2018) [Paper] [Code].

Supplementary data

Statistics about the datasets, and some prediction files. Available for download as supp.zip [Mirror 1] [Mirror 2].

Reference

If you use this code or data in your work, please cite our publication:

@article{antunes2019a,
  author    = {Antunes, Rui and Matos, S{\'e}rgio},
  journal   = {Database},
  month     = oct,
  number    = {baz095},
  publisher = {{Oxford University Press}},
  title     = {Extraction of chemical--protein interactions from the literature using neural networks and narrow instance representation},
  url       = {https://doi.org/10.1093/database/baz095},
  volume    = {2019},
  year      = {2019},
}

About

BioCreative VI — Track 5: text mining chemical–protein interactions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages