Skip to content

Preprocessing WGS Data

Sam Minot edited this page Jan 16, 2020 · 1 revision

By default, geneshot will perform preprocessing on the raw paired-end FASTQ datasets. This consists of:

  • Optionally running barcodecop to ensure that the samples were demultiplexed correctly (if index reads are provided in the I1 and I2 columns of the manifest)
  • Trimming adapters using cutadapt (adapter sequences can be manually specified using the --adapter_F and --adapter_R flags)
  • Removing reads which align to the human genome (defaults to the current human genome, but can be customized with --hg_index_url)

The entire preprocessing suite of tasks can be skipped with the --nopreprocess flag.