Aligning Nanopore reads against a highly similar reference. #1251

coro1c · 2024-10-24T13:31:48Z

Hello,

We are doing high-throughput cloning of different gene variants and want to move the QC of our cloning products to Nanopore sequences. I did a trial run with a library of ~400 variants. The variants have large constant parts but differ in two 40 bp stretches.

When I try to align the Nanopore reads from a flongle (N50: 2.14kb, avg Q-score: 18), I get large alignment errors most likely due to the reference being so similar. I tried several flags (-U, -f to limit kmers in common regions, -G to limit gap length to a few bp, ...). However, my alignment either crashes or gives low quality alignments in the variable regions (in the picture: ~350 and ~1350 bp).

Can you maybe give some advice/recommendations, how I can best tackle this alignment problem?

Many thanks in advance.

Best,
Marie

lh3 · 2024-10-25T02:05:40Z

Are you using 400 sequences as the reference? What command line are you using exactly? Why do you think the alignment is wrong, instead of the cloning being wrong?

coro1c · 2024-10-25T12:44:45Z

Yes, I use 400 sequences as reference. I am using the default settings of EPI2ME wf-alignment, so I think these are the -x map-ont settings.

I checked for a some reads manually and they actually map back to another reference with some indels.
So I cannot fully rule out cloning errors but for all reads I checked (~20) I got actually a better fitting reference. I think the largest problem are actually indels due to the low Q score of 18.

Many thanks for your help.

lh3 · 2024-10-27T04:20:02Z

Please run minimap2 independently and provide the exact command line. Also what do you mean by "better"? Is the alignment score higher? Eyeballing doesn't really count.

veghp · 2024-11-22T13:31:12Z

We regularly use minimap2 in our pipeline on Nanopore data for DNA cloning (plasmid construct) QC without issues. Such a "cliff" in the coverage suggests that there are two main types of DNA products in your sample, with a large common element. We always start the analysis by running Nanoplot on the raw reads: having multiple peaks on the read length histogram suggests a polyclonal sample (on top of the obvious 400 variants which should have the same length. That approach works when you have at least some full-length reads).

coro1c · 2025-01-16T15:14:13Z

It was indeed cloning errors that were mostly causing the issue. Many thanks.

coro1c closed this as completed Jan 16, 2025

lh3 added the question label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aligning Nanopore reads against a highly similar reference. #1251

Aligning Nanopore reads against a highly similar reference. #1251

coro1c commented Oct 24, 2024

lh3 commented Oct 25, 2024

coro1c commented Oct 25, 2024

lh3 commented Oct 27, 2024

veghp commented Nov 22, 2024

coro1c commented Jan 16, 2025

Aligning Nanopore reads against a highly similar reference. #1251

Aligning Nanopore reads against a highly similar reference. #1251

Comments

coro1c commented Oct 24, 2024

lh3 commented Oct 25, 2024

coro1c commented Oct 25, 2024

lh3 commented Oct 27, 2024

veghp commented Nov 22, 2024

coro1c commented Jan 16, 2025