Preprocessing WGS Data

Jump to bottom

Sam Minot edited this page Jan 16, 2020 · 1 revision

By default, geneshot will perform preprocessing on the raw paired-end FASTQ datasets. This consists of:

Optionally running barcodecop to ensure that the samples were demultiplexed correctly (if index reads are provided in the I1 and I2 columns of the manifest)
Trimming adapters using cutadapt (adapter sequences can be manually specified using the --adapter_F and --adapter_R flags)
Removing reads which align to the human genome (defaults to the current human genome, but can be customized with --hg_index_url)

The entire preprocessing suite of tasks can be skipped with the --nopreprocess flag.