-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Suggestion] Use score calibration when identifying proviruses and plasmids #35
Comments
I apply geNomad to GTDB assemblies which usually contain < 200 contigs, does the flag I aim to mask both prophages and plasmids, which would affect the bacteria/viruses detection accuracy. So I used the On an assembly (GCA_000018565.1) with plasmids and prophages, the scores of viruses increased, but that of plasmids fell, and one plasmid was missed. It might be because the plasmid_score is smaller than the threshold (0.5?)
Without score calibration:
With score calibration:
|
I misread your code. I though you were running geNomad on a concatenated FASTA (which wouldn't make any sense). You're right that calibration probably won't work well on single genomes because there are few scaffolds. I recommend using it when you have ~1,000 sequences (although ~500 or less would also work well, as you can see in Suppl. Figure 2B). As for your example, it is difficult to tell what happened from the summary files. You can see the calibrated scores of all sequences in the Also, the cutoff when using the The only option that I can think of is concatenating all of the genomes, running geNomad on the concatenated FASTA, and then "deconcatenating". But that's a lot of work and might not work with your parallelization mechanism. |
Just a suggestion for the database build step. Since the sample size is pretty big, it's worth using the
--enable-score-calibration
in geNomad. This parameter substantially improves the classification accuracy and should improve the reliability of the detection (see Supplementary Figure 5D here).The text was updated successfully, but these errors were encountered: