Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended parameters for metagenome assembly and a related question #30

Closed
xfengnefx opened this issue Sep 15, 2022 · 5 comments
Closed
Labels
question Further information is requested

Comments

@xfengnefx
Copy link

Hi,

I want to try mdBG on real metagenome samples. I wonder if you could suggest a parameter combo to use (or combos to try out). And should I do the multi-k mode?

For the real samples, I could crudely guess the number of species in the library, and perhaps an exaggerated total genome size from it as well. I'm not sure if these could be useful.

Another question is: could mdBG output contig coverage estimates?

Thank you!

@ekimb ekimb added the question Further information is requested label Sep 16, 2022
@rchikhi
Copy link
Collaborator

rchikhi commented Sep 21, 2022

Dear Xiaowen, thanks for your interest!

For mdbg on metagenomes (or in fact isolates too), there are several possible execution modes:

  1. single parameter
  2. automatic parameters (it will autodetect)
  3. multi-k

For 1., our paper experiments were made with -k 21 -l 14 --density 0.003 so it seems reasonable to try that. We never tested 2. on metagenomes but I suspect it will also give reasonable parameters. Regarding 3., rust-mdbg also has a multi-k mode but we didn't tune it for metagenomes, so I would not recommend running the current multi-k script with metagenomes.

We don't have a way to adjust parameters in terms of number of species and genome size. I suggest you just run with one of the two ways above (1. or 2.) and see if the results look reasonable. For mdbg in metagenomics, a reasonable result will be that the per-species coverage is high but contiguity is lower than hifiasm-meta.

In any case, please make sure to use the https://github.com/ekimb/rust-mdbg/blob/master/utils/magic_simplify_meta script and not the usual magic_simplify because otherwise too many contigs will be discarded.

rust-mdbg does not output contig coverage estimates. The unsimplified output GFA does have kminmer abundance, per node. That information isn't propagated to the simplified GFA, as I'm unsure how accurate it ends up being in terms of actual base coverage.

please let us know if you have any issues,

best,
Rayan

@xfengnefx
Copy link
Author

Hi Rayan,

Thank you so much for the suggestions and mentioning magic_simplify_meta. I will try the first two ways. I wasn't sure how total genome size and --density would interact, it's nice to know that this isn't a concern. I once accidentally set two parameters too low for HiCanu by not reading the docs...

I leave the issue open for now in case I may need more advises from you. I will come back and close it by next week if I don't run into anything. Thank you!

Best,
Xiaowen

@xfengnefx
Copy link
Author

xfengnefx commented Sep 27, 2022

Thanks a lot for the help, assembly runs were smooth. I have one additional question, not related to the issue's title though: have you tried busco (eukaryotes) or checkM (microbial) for evaluation? Could you offer some advises if so?

I tried checkM1 and it seems to be confused by insertions. I have not tried checkM2 yet.

@rchikhi
Copy link
Collaborator

rchikhi commented Oct 7, 2022

Hi Xiaowen, great to hear.

We haven't run extensive evaluations using checkM on our rust-mdbg metagenomes, but based on feedback by a collaborator, it makes sense that rough unpolished metagenome assemblies, such as the ones produced by rust-mdbg, would have poor checkM score due to indels, provoking frameshifts, then hurting sensibility of the gene detection method thus lowering the gene completeness score.

The gene is in fact likely there in the assembly, except not detected due the need for high base quality in those assembly assessment methods. One possible workaround would be to run a polishing software such as racon on the assembly, but this is just a hypothesis.

Rayan

@xfengnefx
Copy link
Author

Awesome, thank you for the suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants