Skip to content

Extract the InterPro data for the PomBase proteins and write as JSON

Notifications You must be signed in to change notification settings

pombase/pombase-domain-process

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PomBase code for processing domains

This program processes the match_complete.xml.gz from InterPro and also runs TMHMM to generate a JSON of domain information.

The latest InterPro file is available from: https://ftp.ebi.ac.uk/pub/databases/interpro/current_release/

UniProt IDs for pombe proteins are queried from PostgreSQL. Those IDs are used to filter the InterPro file.

Protein sequences are queried from PostgreSQL and are passed to TMHMM. We run TMHMM in a separate thread while the InterPro XML is parsed and processed.

Running

Run with:

PATH=$PATH_TO_TMHMM_EXE:$PATH /var/pomcur/bin/pombase-interpro \
    -p "postgres://<username>:<password>@localhost/<dbname>" \
    -i <(gzip -d < match_complete.xml.gz) -o pombe_domain_results.json

Status

Tests

About

Extract the InterPro data for the PomBase proteins and write as JSON

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages