Skip to content

Pipeline designed to generate probability profiles for use by DETECT in Scinet.

License

Notifications You must be signed in to change notification settings

LLYX/DETECT-probability-profile-generation-pipeline

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Basic Usage

Simply place this folder into Scinet, place raw dat file of all proteins to process and a fasta file of all proteins that meet your criteria, generate a blast db based on your list of proteins, and run the bash script "0_reset_for_round_n -n", where -n is the number of individual jobs to split into (currently set to optimally use some multiple of 8). This will start the pipeline, and should result in two files, one for positive and oen for negative densities, per viable EC.

Preprocessing

0_prepare_sequence_data: Used to filter out proteins which do not belong to a viable class for analysis in DETECT from dat file into fasta file.

0_make_blast_db: Used to generate the blast db based on list of proteins from fasta generated from above.

Postprocessing

0_create_mappings_and_prior_probabilities_file: Create two files containing the mappings of sequence IDs to EC, and prior probabilities used for the Bayesian estimation of DETECT, which will be required for DETECT to function.

Warning

Some filenames may have to be changed, such as whenever a reference to the dat or fasta files are made. These filenames will come from external sources and are not generated exactly the same way automatically. The path to the EMBOSS package will also have to be changed to one that is available to you. Furthermore, please ensure that you have installed all the necessary packages (those that are imported in the bash script headers). Finally, this pipeline is designed to run off of the Scinet cluster; usage on other clusters will require further modifications.

For more information, bug reports, or otherwise, please contact: leon.xu@mail.utoronto.ca

About

Pipeline designed to generate probability profiles for use by DETECT in Scinet.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 84.4%
  • Shell 15.6%