Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aligning Nanopore reads against a highly similar reference. #1251

Open
coro1c opened this issue Oct 24, 2024 · 4 comments
Open

Aligning Nanopore reads against a highly similar reference. #1251

coro1c opened this issue Oct 24, 2024 · 4 comments

Comments

@coro1c
Copy link

coro1c commented Oct 24, 2024

Hello,

We are doing high-throughput cloning of different gene variants and want to move the QC of our cloning products to Nanopore sequences. I did a trial run with a library of ~400 variants. The variants have large constant parts but differ in two 40 bp stretches.

When I try to align the Nanopore reads from a flongle (N50: 2.14kb, avg Q-score: 18), I get large alignment errors most likely due to the reference being so similar. I tried several flags (-U, -f to limit kmers in common regions, -G to limit gap length to a few bp, ...). However, my alignment either crashes or gives low quality alignments in the variable regions (in the picture: ~350 and ~1350 bp).
grafik
grafik
grafik

Can you maybe give some advice/recommendations, how I can best tackle this alignment problem?

Many thanks in advance.

Best,
Marie

@lh3
Copy link
Owner

lh3 commented Oct 25, 2024

Are you using 400 sequences as the reference? What command line are you using exactly? Why do you think the alignment is wrong, instead of the cloning being wrong?

@coro1c
Copy link
Author

coro1c commented Oct 25, 2024

Yes, I use 400 sequences as reference. I am using the default settings of EPI2ME wf-alignment, so I think these are the -x map-ont settings.

I checked for a some reads manually and they actually map back to another reference with some indels.
So I cannot fully rule out cloning errors but for all reads I checked (~20) I got actually a better fitting reference. I think the largest problem are actually indels due to the low Q score of 18.

Many thanks for your help.

@lh3
Copy link
Owner

lh3 commented Oct 27, 2024

Please run minimap2 independently and provide the exact command line. Also what do you mean by "better"? Is the alignment score higher? Eyeballing doesn't really count.

@veghp
Copy link

veghp commented Nov 22, 2024

We regularly use minimap2 in our pipeline on Nanopore data for DNA cloning (plasmid construct) QC without issues. Such a "cliff" in the coverage suggests that there are two main types of DNA products in your sample, with a large common element. We always start the analysis by running Nanoplot on the raw reads: having multiple peaks on the read length histogram suggests a polyclonal sample (on top of the obvious 400 variants which should have the same length. That approach works when you have at least some full-length reads).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants