Skip to content

Commit

Permalink
Merge pull request #212 from rhpvorderman/improvereadme
Browse files Browse the repository at this point in the history
Add more usage examples and runtime estimate on the README
  • Loading branch information
rhpvorderman authored Nov 29, 2024
2 parents bf7ef02 + a9163d5 commit 1967f3f
Show file tree
Hide file tree
Showing 2 changed files with 47 additions and 1 deletion.
3 changes: 2 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Changelog
version 0.13.0-dev
------------------
+ Python 3.13 support was added.
+ Python 3.8 is no longer supported.
+ Python 3.8 and 3.9 are no longer supported.
+ Allow proper judging of aligned BAM files as input data by ignoring any
secondary or supplementary alignment records. This is equivalent to running
``samtools fastq > input.fastq`` on the input data before submitting it to
Expand All @@ -20,6 +20,7 @@ version 0.13.0-dev
false positives from common human genome repeats. The amount of base pairs
that are sampled from the beginning and end is user settable with an option
to sample everything.
+ Extended the README with a few usage examples.

version 0.12.0
------------------
Expand Down
45 changes: 45 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,9 @@ Features:

+ `MultiQC <https://multiqc.info>`_ support since MultiQC version 1.22.
+ Low memory footprint, small install size and fast execution times.

+ Sequali typically needs less than 2 GB of memory and 3-30 minutes runtime
when run on 2 cores (the default).
+ Informative graphs that allow for judging the quality of a sequence at
a quick glance.
+ Overrepresentation analysis using 21 bp sequence fragments. Overrepresented
Expand Down Expand Up @@ -123,6 +126,48 @@ Quickstart
This will create a report ``my.fastq.gz.html`` and a json ``my.fastq.gz.json``
in the current working directory.

To set the directory where the reports are created the ``--outdir`` flag can
be used. This is useful when using [MultiQC](https://github.com/multiqc/multiqc).

.. code-block::
sequali --out-dir /my/dir/all_sequali_reports my.fastq.gz
The html and json filenames can be set separately.

.. code-block::
sequali --html before_qc.html --json before_qc.json my.fastq.gz
sequali --html after_qc.html --json after_qc.json my.cutadapt.fastq.gz
Sequali can handle paired-end data.

.. code-block::
sequali /sequencing_data/sample100_R1.fastq.gz /sequencing_data/sample100_R2.fastq.gz
Additionally sequali can handle BAM data. Proper pair handling is not yet supported for
BAM data, so this is primarily useful for ONT datasets.

.. code-block::
sequali /sequencing_data/sample100_dorado_called_hac_v4.30.bam
Sequali by default uses one thread per compressed input file and one thread for
the read processing, typically keeping two cores busy. Sequali can also use a single
core, which is slower, but typically more efficient for HPC scenarios where
multiple files can be run simultaneously. (Below a SLURM example.)

.. code-block::
sbatch -c 1 --time 59 --partition short \
--wrap 'sequali --threads 1 /cluster-scratch/myusername/my.fastq.gz'
Using a thread count higher than ``2`` has no effect. Due to the decompression
bottleneck, bringing the full power of multithreading to Sequali has limited
utility whilst having a disproportionally high cost in additional code
complexity.

.. quickstart end
For all command line options checkout the
Expand Down

0 comments on commit 1967f3f

Please sign in to comment.