Merge pull request #212 from rhpvorderman/improvereadme

Add more usage examples and runtime estimate on the README
rhpvorderman · Nov 29, 2024 · 1967f3f · 1967f3f
2 parents bf7ef02 + a9163d5
commit 1967f3f
Show file tree

Hide file tree

Showing 2 changed files with 47 additions and 1 deletion.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -10,7 +10,7 @@ Changelog
 version 0.13.0-dev
 ------------------
 + Python 3.13 support was added.
-+ Python 3.8 is no longer supported.
++ Python 3.8 and 3.9 are no longer supported.
 + Allow proper judging of aligned BAM files as input data by ignoring any
   secondary or supplementary alignment records. This is equivalent to running
   ``samtools fastq > input.fastq`` on the input data before submitting it to
@@ -20,6 +20,7 @@ version 0.13.0-dev
   false positives from common human genome repeats. The amount of base pairs
   that are sampled from the beginning and end is user settable with an option
   to sample everything.
++ Extended the README with a few usage examples.
 
 version 0.12.0
 ------------------

diff --git a/README.rst b/README.rst
@@ -41,6 +41,9 @@ Features:
 
 + `MultiQC <https://multiqc.info>`_ support since MultiQC version 1.22.
 + Low memory footprint, small install size and fast execution times.
+
+  + Sequali typically needs less than 2 GB of memory and 3-30 minutes runtime
+    when run on 2 cores (the default).
 + Informative graphs that allow for judging the quality of a sequence at
   a quick glance.
 + Overrepresentation analysis using 21 bp sequence fragments. Overrepresented
@@ -123,6 +126,48 @@ Quickstart
 This will create a report ``my.fastq.gz.html`` and a json ``my.fastq.gz.json``
 in the current working directory.
 
+To set the directory where the reports are created the ``--outdir`` flag can
+be used. This is useful when using [MultiQC](https://github.com/multiqc/multiqc).
+
+.. code-block::
+
+    sequali --out-dir /my/dir/all_sequali_reports my.fastq.gz
+
+The html and json filenames can be set separately.
+
+.. code-block::
+
+    sequali --html before_qc.html --json before_qc.json my.fastq.gz
+    sequali --html after_qc.html --json after_qc.json my.cutadapt.fastq.gz
+
+Sequali can handle paired-end data.
+
+.. code-block::
+
+    sequali /sequencing_data/sample100_R1.fastq.gz /sequencing_data/sample100_R2.fastq.gz
+
+Additionally sequali can handle BAM data. Proper pair handling is not yet supported for
+BAM data, so this is primarily useful for ONT datasets.
+
+.. code-block::
+
+    sequali /sequencing_data/sample100_dorado_called_hac_v4.30.bam
+
+Sequali by default uses one thread per compressed input file and one thread for
+the read processing, typically keeping two cores busy. Sequali can also use a single
+core, which is slower, but typically more efficient for HPC scenarios where
+multiple files can be run simultaneously. (Below a SLURM example.)
+
+.. code-block::
+
+    sbatch -c 1 --time 59 --partition short \
+    --wrap 'sequali --threads 1 /cluster-scratch/myusername/my.fastq.gz'
+
+Using a thread count higher than ``2`` has no effect. Due to the decompression
+bottleneck, bringing the full power of multithreading to Sequali has limited
+utility whilst having a disproportionally high cost in additional code
+complexity.
+
 .. quickstart end
 
 For all command line options checkout the