Skip to content

Releases: valdeanda/mebs

Pfamv34. MEBS clustering

19 Oct 00:42
Compare
Choose a tag to compare

Required files to perform clustering based on presence/absence of PFAM domains as described in Langwig-De Anda et al., 2021

How to use mebs_clust.py with the Pfam v34.0 (October 2021, 19,179 entries)

Warning!. This is a heavy file, make sure you have enough disk space before downloading it
The compressed file 274M and the uncompressed file is 1.85Gb

Custom Pfam database v1

19 Aug 16:33
Compare
Choose a tag to compare

Data to customize PFAMS searches using mebs

The compressed directory contains:

  1. PfamA database from 29/8/18: my_Pfam.pfam.hmm
    obtained from => ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/
  2. Random entropy values that are needed for the script mebs.pl to run: entropies.tab
    Not used to compute any score in the custom option
  3. Example mapping file: pfam2kegg.tab

Steps to follow

To compute the metabolic completeness of your genomic/metagenomic sample with custom Pfams do the following

  1. Download this file, and place it under cycles/pfam directory.
wget https://github.com/valdeanda/mebs/releases/download/custom_pfam/custom_pfam.tar.gz 
  1. Decompress the file
tar -xvzf custom_pfam.tar.gz

3 The mapping file pfam2kegg.tab contains the set of PFAM marker genes described in Peura et al. 2015 (https://www.nature.com/articles/srep12102); you can modify these Pfams and add the Pfams of interest.

less pfam/pfam2kegg.tab
PFAM    KO      PATHWAY PATHWAY NAME
PF03598         1       WL
PF00101         2       Calvin Cycle
PF06240         3       CO oxidation 
PF14710         4       Denitrification
PF02665         4       Denitrification

  1. Modify the config file to add the path of the new Pfam database as the following example. You don't need to specify the number of input genes or genomes for the Pfam database.
less config/config.txt
Cycle   Path    Comple  Input Genes     Input Genomes   Domains AUC     Score(FDR0.1)   Score(FDR0.01)  Score(FDR0.001) Score(FDR0.0001)
sulfur  cycles/sulfur/  cycles/sulfur/pfam2kegg.tab     152     161     112     0.985   4.156   8.049   10.816  12.285
carbon  cycles/carbon/  cycles/carbon/pfam2kegg.tab     135     90      119     0.988   9.735   18.744  34.26   34.908
oxygen  cycles/oxygen/          50      53      55      0.983   5.098   7.288   8.155   8.247
iron    cycles/iron/    cycles/iron/pfam2kegg.tab       36      34      112     0.863   7.412   9.571   10.241  10.322
nitrogen        cycles/nitrogen/        cycles/nitrogen/pfam2kegg.tab   267     144     176     0.791   15.974  17.7    18.785  19.03
pfam    cycles/pfam/    cycles/pfam/pfam2kegg.tab
      
  1. Run mebs.pl normally, but take into account that you are now doing hmmsearches against the entire Pfam database, so depending on the number and size of your samples this step might take a while. However, once you've scanned your sample against the entire Pfam database, you can modify the mapping file as many times as you want and the results of completeness will take seconds.
 perl mebs.pl -input gen_test/ -type genomic -comp > test.pfam.tab