Releases: valdeanda/mebs
Releases · valdeanda/mebs
Pfamv34. MEBS clustering
Required files to perform clustering based on presence/absence of PFAM domains as described in Langwig-De Anda et al., 2021
How to use mebs_clust.py with the Pfam v34.0 (October 2021, 19,179 entries)
Warning!. This is a heavy file, make sure you have enough disk space before downloading it
The compressed file 274M and the uncompressed file is 1.85Gb
Custom Pfam database v1
Data to customize PFAMS searches using mebs
The compressed directory contains:
- PfamA database from 29/8/18: my_Pfam.pfam.hmm
obtained from => ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/ - Random entropy values that are needed for the script mebs.pl to run: entropies.tab
Not used to compute any score in the custom option - Example mapping file: pfam2kegg.tab
Steps to follow
To compute the metabolic completeness of your genomic/metagenomic sample with custom Pfams do the following
- Download this file, and place it under cycles/pfam directory.
wget https://github.com/valdeanda/mebs/releases/download/custom_pfam/custom_pfam.tar.gz
- Decompress the file
tar -xvzf custom_pfam.tar.gz
3 The mapping file pfam2kegg.tab contains the set of PFAM marker genes described in Peura et al. 2015 (https://www.nature.com/articles/srep12102); you can modify these Pfams and add the Pfams of interest.
less pfam/pfam2kegg.tab
PFAM KO PATHWAY PATHWAY NAME
PF03598 1 WL
PF00101 2 Calvin Cycle
PF06240 3 CO oxidation
PF14710 4 Denitrification
PF02665 4 Denitrification
- Modify the config file to add the path of the new Pfam database as the following example. You don't need to specify the number of input genes or genomes for the Pfam database.
less config/config.txt
Cycle Path Comple Input Genes Input Genomes Domains AUC Score(FDR0.1) Score(FDR0.01) Score(FDR0.001) Score(FDR0.0001)
sulfur cycles/sulfur/ cycles/sulfur/pfam2kegg.tab 152 161 112 0.985 4.156 8.049 10.816 12.285
carbon cycles/carbon/ cycles/carbon/pfam2kegg.tab 135 90 119 0.988 9.735 18.744 34.26 34.908
oxygen cycles/oxygen/ 50 53 55 0.983 5.098 7.288 8.155 8.247
iron cycles/iron/ cycles/iron/pfam2kegg.tab 36 34 112 0.863 7.412 9.571 10.241 10.322
nitrogen cycles/nitrogen/ cycles/nitrogen/pfam2kegg.tab 267 144 176 0.791 15.974 17.7 18.785 19.03
pfam cycles/pfam/ cycles/pfam/pfam2kegg.tab
- Run mebs.pl normally, but take into account that you are now doing hmmsearches against the entire Pfam database, so depending on the number and size of your samples this step might take a while. However, once you've scanned your sample against the entire Pfam database, you can modify the mapping file as many times as you want and the results of completeness will take seconds.
perl mebs.pl -input gen_test/ -type genomic -comp > test.pfam.tab