Refer to the SMETANA documentation for more details regarding usage.
$ smetana -h
usage: smetana [-h] [-c COMMUNITIES.TSV] [-o OUTPUT] [--flavor FLAVOR]
[-m MEDIA] [--mediadb MEDIADB]
[-g | -d | -a ABIOTIC | -b BIOTIC] [-p P] [-n N] [-v] [-z]
[--solver SOLVER] [--molweight] [--exclude EXCLUDE]
[--no-coupling]
MODELS [MODELS ...]
Calculate SMETANA scores for one or multiple microbial communities.
positional arguments:
MODELS
Multiple single-species models (one or more files).
You can use wild-cards, for example: models/*.xml, and optionally protect with quotes to avoid automatic bash
expansion (this will be faster for long lists): "models/*.xml".
optional arguments:
-h, --help show this help message and exit
-c COMMUNITIES.TSV, --communities COMMUNITIES.TSV
Run SMETANA for multiple (sub)communities.
The communities must be specified in a two-column tab-separated file with community and organism identifiers.
The organism identifiers should match the file names in the SBML files (without extension).
Example:
community1 organism1
community1 organism2
community2 organism1
community2 organism3
-o OUTPUT, --output OUTPUT
Prefix for output file(s).
--flavor FLAVOR Expected SBML flavor of the input files (cobra or fbc2).
-m MEDIA, --media MEDIA
Run SMETANA for given media (comma-separated).
--mediadb MEDIADB Media database file
-g, --global Run global analysis with MIP/MRO (faster).
-d, --detailed Run detailed SMETANA analysis (slower).
-a ABIOTIC, --abiotic ABIOTIC
Test abiotic perturbations with given list of compounds.
-b BIOTIC, --biotic BIOTIC
Test biotic perturbations with given list of species.
-p P Number of components to perturb simultaneously (default: 1).
-n N
Number of random perturbation experiments per community (default: 1).
Selecting n = 0 will test all single species/compound perturbations exactly once.
-v, --verbose Switch to verbose mode
-z, --zeros Include entries with zero score.
--solver SOLVER Change default solver (current options: 'gurobi', 'cplex').
--molweight Use molecular weight minimization (recomended).
--exclude EXCLUDE List of compounds to exclude from calculations (e.g.: inorganic compounds).
--no-coupling Don't compute species coupling scores.
Refer to the methods sections of the SMETANA paper for details regarding the implementation of MILP probelms that are solved in order to compute the following:
- The species coupling score (SCS) measures the dependence of growth of species A on species B (SCSA,B)
- calculated by enumerating all possible community member subsets where species A can grow, SCSA,B is the fraction of subsets where both species A and B can grow.
- The metabolite uptake score (MUS) measures the dependence of growth of species A on metabolite m (MUSA,m)
- calculated by enumerating all possible metabolite requirement subsets where species A can grow, MUSA,m is the fraction of subsets where both species A grows and metabolite m is taken up.
- The metabolite production score (MPS) is a binary score indicating whether a given species B can produce metabolite m (MPS = 1) or not (MPS = 0) in the community of N members (MPSB,m)
- The SMETANA score ranges from 0 to 1
- measures how strongly a receiver species relies on a donor species for a particular metabolite
- SMETANAA,B,m = SCSA,B * MUSA,m * MPSB,m
Note: There may be equivalent solutions that satisfy the linear programming problems posed by the detailed and global algorithms. To explore the solution space run multiple simulations and then take averages. Use the --molweight
flag to predict interactions on community-specific minimal media. Use the --zeros
flag in order to accurately calculate averages across samples.
Try running a few simulation using your community of GEMs. Assuming your models are in the SymbNET/models/$COMM
folder
$ for i in {1..3}; do
echo "Running simulation $i out of 3 ... ";
smetana --flavor ucsd -o sim_${i} -v -d --molweight --zeros $ROOT/models/$COMM/*.xml;
done
Use the head
command to inspect the generated detailed interactions
$ head sim_1_detailed.tsv
community medium receiver donor compound scs mus mps smetana
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_12ppd__S_e 0.0 0.0 1 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_15dap_e 0.0 0.0 0 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_2pglyc_e 0.0 0.0 0 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_3amp_e 0.0 0.0 0 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_3cmp_e 0.0 0.0 0 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_3gmp_e 0.0 0.0 0 0.0
all minimal ERR260172_bin.10.p.faa ERR260172_bin.31.s.faa M_3hcinnm_e 0.0 0.0 0 0.0
If you successfully generated detailed predictions for your community, append these to the smet_all.tsv
file, remember to gunzip it first! e.g.
$ gunzip $ROOT/data/smet_all.tsv.gz
Next loop through each of the detailed simulation output files, reformat them, and append to smet_all.tsv
. This is the file that is loaded by the following R-markdown file for visualization.
$ while read file;do
sim=$(basename $file|sed 's/_detailed.tsv//g'|sed 's/sim_//g');
cat $file|grep -v smetana|sed "s/^all/$COMM/g"|sed "s/^/$sim\t/g" >> $ROOT/data/smet_all.tsv;
done< <(find . -name "sim*detailed.tsv"|grep -v simulations)
Check that you successfully reformatted and appended your simulations to the smet_all.tsv
file with using the tail
command:
$ tail $ROOT/data/smet_all.tsv
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_val__L_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_vanln_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xan_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xmp_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xtsn_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xyl3_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xyl__D_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xylan4_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_xylb_e 0.5 0.0 0 0.0
1 soil minimal ERR671933_bin.5.s.faa ERR671933_bin.4.o.faa M_zn2_e 0.5 1.0 0 0.0
Do not worry if you run out of time or are unable to generate detailed interactions for any reason, you already have pre-computed results that you may use for visualization and discussion.
Unless you submit jobs in parallel on the cluster it will not be feasible to generate 100's of simulation results within the timeframe of this course. The following code demonstrates how the detailed interactions were precomputed for the different datasets. You do not need to generate results for each community, as all results are pre-computed in their respective directories.
The following shows how all detailed interactions were pre-computed, assuming your current folder contains your community's GEMs:
$ for i in {1..100}; do
echo "Running simulation $i out of 100 ... ";
smetana --flavor ucsd -o sim_${i} -v -d --molweight --zeros $ROOT/models/$COMM/*.xml;
done
Even though SMETANA may run on a standard laptop machine depending on your community size, this approach is not practical for scaling up and simulating large numbers of communities, e.g. on the order of 10's of thousands. For such large scale analyses we developed the metaGEM pipeline, which uses the Snakemake workflow manager to submit parallelized jobs on the high performance computer cluster (HPCC). For example, to submit 10,000 SMETANA jobs each with 2 cores + 3 GB RAM and a 2 hour max time limit:
$ bash metaGEM.sh --task smetana --nJobs 10000 --cores 2 --mem 3 --hours 2
Note: You do not have metaGEM installed on your virtual machines, so you will not be able to run the command above. Refer to the metaGEM repo's quickstart, manual installation guide, or google colab notebook for setup instructions.
- How does the SMETANA detailed algorithm work at a high level?
- What are the underlying assumptions?
- What does each metric measure and how is it calculated?
- SCS
- MUS
- MPS
- SMETANA
- What does a SMETANA score of 0 mean? What does a SMETANA score of 1 mean?
- Why do we run multiple simulations and take averages of SMETANA scores? Why is it important to use the
--zeros
flag in this case? - How does choice of simulation media affect the SMETANA detailed algorithm output?
- What does the
--molweight
flag do? How can it be useful?