You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
== This is sourmash version 4.8.10. ==
== Please cite Irber et. al (2024), doi:10.21105/joss.06830. ==
Loaded 2 query signatures.
query p_genome avg_abund p_metag metagenome name
-------- -------- --------- ------- ---------------
F. prausnitzii 4.3% 24.7 2.5% CD136
B. uniformis 48.3% 2.0 3.4% CD136
F. prausnitzii 0.0% 0.0 0.0% CD237
B. uniformis 62.6% 14.5 26.5% CD237
Here,
p_genome is the fraction of the genome covered by the metagenome ("detection");
avg_abund is the average abundance of that genome in the metagenome (~mapping abundance);
p_metag is the fraction of the metagenome that would map to that genome;
More things that could be done
We have a much faster version of this in the works; this won't scale super well with dozens to hundreds of metagenomes.
We can provide ANI numbers between genome and metagenome if that's of interest.
Happy to chat about appropriate k-mer sizes and thresholds. This uses k=31 and no particular threshold; based on work here (Antarctic metagenome search paper) and elsewhere, I would expect that a threshold of > 99% of k-mers with a k-mer size of 51 would correspond to strain-resolved identity of the kind you're interested in.
The text was updated successfully, but these errors were encountered:
https://hackmd.io/tKpLr1ISR9mqHHmZbEvcow?view
containment search tutorial: mgmanysearch
Install relevant sourmash stuff
Get data
Download 2.2 GB of compressed raw reads in
IBD_tutorial_raw.tar.gz
from the MintO tutorial data set:This will create a directory
IBD_tutorial_raw
; we're interested in themetaG/
reads underneath.Prepare two metagenome samples for search
Let's sketch two of the samples:
combine pairs, sketch with abundances:
These are now our metagenome sketches!
Grab a genome or two
Sketch genomes
Sketch them as well - no abundances needed:
Search!
Then run
which should yield:
Here,
p_genome
is the fraction of the genome covered by the metagenome ("detection");avg_abund
is the average abundance of that genome in the metagenome (~mapping abundance);p_metag
is the fraction of the metagenome that would map to that genome;More things that could be done
We have a much faster version of this in the works; this won't scale super well with dozens to hundreds of metagenomes.
We can provide ANI numbers between genome and metagenome if that's of interest.
Happy to chat about appropriate k-mer sizes and thresholds. This uses k=31 and no particular threshold; based on work here (Antarctic metagenome search paper) and elsewhere, I would expect that a threshold of > 99% of k-mers with a k-mer size of 51 would correspond to strain-resolved identity of the kind you're interested in.
The text was updated successfully, but these errors were encountered: