Releases · rhpvorderman/sequali

22 Mar 13:58

rhpvorderman

v0.5.1

8014193

version 0.5.1

Fix a bug in the overrepresented sequence sampling where the fragments from
the back half of the sequence were incorrectly sampled. Leading to the last
fragment being sampled over and over again.

Assets 2

15 Mar 15:30

rhpvorderman

v0.5.0

1eb75ef

version 0.5.0

Base the percentage in the overrepresented sequences section on the number
of found fragments divided by the number of sampled sequences. Previously
this was based on the number of sampled fragments, which led to very low
percentages for long read sequences, whilst also being less intuitive to
understand. There were some inconsistencies in the documentation about this
that are now fixed.
Add a new meta section to the JSON report to allow integration with
MultiQC <https://github.com/multiqc/MultiQC>_.
Add all nanopore barcode sequences and native adapters to the contaminants.
Add native adapters to the adapter search.

Assets 2

01 Dec 18:55

rhpvorderman

v0.4.1

8f712cd

version 0.4.1

Fixed an issue that caused an off by one error if start and end time
of a Nanopore run were at certain intervals.

Assets 2

01 Dec 09:32

rhpvorderman

v0.4.0

825e3ee

version 0.4.0

Fix bugs that were triggered when empty reads were present on
illumina and nanopore platforms.
Fix a bug that was triggered when a single nucleotide read was present on
a nanopore platform.
Add a --version command line flag.
Add an --adapter-file file flag which can be used to set custom adapter
files by users.

Assets 2

22 Nov 13:39

rhpvorderman

v0.3.0

1825de9

version 0.3.0

Fingerprint using offsets of 64 bases from both ends of the sequence.
On nanopore sequencing this prevents taking into account adapter sequences
for the duplication estimate. It also prevents taking sequences from the
error-prone regions. The fingerprint consists of two 8 bp sequences rather
than the two 16 bp sequences that were used before. This made the fingerprint
less prone to sequencing errors, especially in long read sequencing
technologies. As a result the duplication estimate on nanopore reads
should be more accurate.
Added a small header with information on where to submit bug reports.
Use different adapter probes for nanopore adapters, such that the probes
do occur at some distance from the strand extremities. The start and end
of nanopore sequences are prone to errors and this hindered adapter
detection.
Distinguish between top and bottom adapters for the adapter occurrence plot.
Update pygal to 3.0.4 to prevent installation errors on Python 3.12.
Fix several divide by 0 errors that occurred on empty reads and empty files.
Change default fragment length from 31 to 21 which increases the sensitivity
of the overrepresented sequences module.

Assets 2

15 Nov 08:59

rhpvorderman

v0.2.0

7d2cb2f

version 0.2.0

Fixed a crash that occurred in the illumina header checking code on
illumina headers without the comment part.
--max-unique-sequences flag replaced with
--overrepresentation-max-unique-fragments to be consistent with the
report and other flags.
Lots of formatting improvements were made to the report:
- The quality distribution plot now use Matplotlib's RdBu colormap. Like
  the old colormap, it goes from red to blue via white, but is much
  clearer visually.
- Tables now have zebra-style coloring and mouse-over coloring to clearly
  distinguish rows.
- The base content plot now uses a green and blue color scheme for GC and
  AT bases respectively. Previously it was red and blue.
- Sans-serif fonts used throughout the report.
- Explanation paragraphs are now in a smaller font and italic to visually
  distuingish them from data generated specifically for the sequencing
  file.
- Plots are now rendered in sans-serif rather than monospace fonts.
- Minor formatting, spelling and style issues were fixed.
The programs CLI help messages have been improved by clearer phrasing,
better metavar names and consistent punctuation.
The reverse complement of the canonical sequence is included in the
overrepresented sequences table.
Make the number of threads configurable on the command line.
Fix build errors on windows

Assets 2

09 Nov 20:17

rhpvorderman

v0.1.0

85911fc

version 0.1.0

In order to get overrepresented sequences across the entire read, reads
are cut into fragments of 31 bp which are stored and counted. If the fragment
store is full, only already stored sequences are counted. One in eight
reads is processed this way.
Add fingerprint-based deduplication estimation based on a technique used in filesystem deduplication estimation <https://www.usenix.org/system/files/conference/atc13/atc13-xie.pdf>_.
Add a BAM parser to allow reading dorado produced unaligned BAM as well as
already aligned BAM files.
Guess sequencing technology from the file header, so only appropriate
adapters can be loaded in the adapter searcher. This improves speed.
Make an assortment of nanopore adapter probes that make it possible to
distuinghish between nannopore adapters despite the nanopore adapters having
a lot of shared subsequences.
Add a module to retrieve nanopore specific information from the header.
Classify overrepresented sequences by using NCBI's UniVec database and an
assortment of nanopore adapters, ligation kits and primers.
Estimate duplication fractions based on counted unique sequences.
Add a JSON report
Add a progressbar powered by tqdm.
Implement a custom parser based on memchr for finding newlines.
Count overrepresented sequences using a hash table implemented in C.
Add a per tile sequence quality module.
Count adapters using a fast shift-AND algorithm.
Create diverse graphs using pygal based on the count matrix.
Implement base module using an optimised count matrix.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: rhpvorderman/sequali

version 0.5.1

version 0.5.0

version 0.4.1

version 0.4.0

version 0.3.0

version 0.2.0

version 0.1.0