Skip to content

Commit

Permalink
Updated docs and removed bugs in version capture for awk and grep
Browse files Browse the repository at this point in the history
  • Loading branch information
GallVp committed Mar 3, 2024
1 parent eb121bd commit a4c6490
Show file tree
Hide file tree
Showing 4 changed files with 32 additions and 19 deletions.
Binary file added docs/images/hic_map.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 29 additions & 16 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,18 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d

<!-- no toc -->

- [plant-food-research-open/assemblyqc: Output](#plant-food-research-openassemblyqc-output)
- [Introduction](#introduction)
- [Pipeline overview](#pipeline-overview)
- [FASTA and GFF3 validation](#fasta-and-gff3-validation)
- [Assemblathon stats](#assemblathon-stats)
- [Genometools gt stat](#genometools-gt-stat)
- [NCBI FCS adaptor](#ncbi-fcs-adaptor)
- [NCBI FCS GX](#ncbi-fcs-gx)
- [BUSCO](#busco)
- [TIDK](#tidk)
- [LAI](#lai)
- [Kraken2](#kraken2)
- [HiC contact map](#hic-contact-map)
- [Synteny](#synteny)
- [Pipeline information](#pipeline-information)
- [FASTA and GFF3 validation](#fasta-and-gff3-validation)
- [Assemblathon stats](#assemblathon-stats)
- [Genometools gt stat](#genometools-gt-stat)
- [NCBI FCS adaptor](#ncbi-fcs-adaptor)
- [NCBI FCS GX](#ncbi-fcs-gx)
- [BUSCO](#busco)
- [TIDK](#tidk)
- [LAI](#lai)
- [Kraken2](#kraken2)
- [HiC contact map](#hic-contact-map)
- [Synteny](#synteny)
- [Pipeline information](#pipeline-information)

### FASTA and GFF3 validation

Expand Down Expand Up @@ -75,6 +72,20 @@ GenomeTools `gt stat` tool calculates a basic set of statistics about features c

### NCBI FCS GX

<details markdown="1">
<summary>Output files</summary>

- `ncbi_fcs_gx/`
- `*.taxonomy.rpt`: [Taxonomy report](https://github.com/ncbi/fcs/wiki/FCS-GX-taxonomy-report#taxonomy-report-output-).
- `*.fcs_gx_report.txt`: A final report of [recommended actions](https://github.com/ncbi/fcs/wiki/FCS-GX#outputs).
- `*.inter.tax.rpt.tsv`: [Select columns](../modules/local/ncbi_fcs_gx_krona_plot.nf) from `*.taxonomy.rpt` used for generation of a Krona taxonomy plot.
- `*.fcs.gx.krona.cut`: Krona taxonomy file [created](../modules/local/ncbi_fcs_gx_krona_plot.nf) from `*.inter.tax.rpt.tsv`.
- `*.fcs.gx.krona.html`: Krona taxonomy plot.

</details>

[FCS-GX detects](https://github.com/ncbi/fcs/wiki/FCS-GX#outputs) contamination from foreign organisms in genome sequences.

### BUSCO

<details markdown="1">
Expand Down Expand Up @@ -155,6 +166,8 @@ LTR Assembly Index (LAI) is a reference-free genome metric that [evaluates assem

Hi-C contact mapping experiments measure the frequency of physical contact between loci in the genome. The resulting dataset, called a “contact map,” is represented using a [two-dimensional heatmap](https://github.com/igvteam/juicebox.js) where the intensity of each pixel indicates the frequency of contact between a pair of loci.

<div align="center"><img src="images/hic_map.png" alt="AssemblyQC - HiC interactive contact map" width="50%"><hr><em>AssemblyQC - HiC interactive contact map</em></div>

### Synteny

<details markdown="1">
Expand All @@ -166,7 +179,7 @@ Hi-C contact mapping experiments measure the frequency of physical contact betwe
- `bundled.links.tsv`: Bundled links file generated with MUMMER and `dnadiff.pl`.
- `circos.conf`: CIRCOS configuration file used to generate the synteny plot.
- `karyotype.tsv`: Karyotype TSV file used to generate the synteny plot.
- `*.on.*.*`: Synteny files corresponding to of a single contig of the target assembly with respect to all contig of the reference assembly.
- `*.on.*.*`: Synteny files corresponding to a single contig of the target assembly with respect to all contigs of the reference assembly.
</details>

Synteny plots are created with Circos which is a tool [facilitating](https://circos.ca) the identification and analysis of similarities and differences arising from comparisons of genomes. The genome-wide alignments are performed with [MUMMER](https://github.com/mummer4/mummer?tab=readme-ov-file) and bundled with [`dnadiff.pl`](https://github.com/mummer4/mummer/blob/master/scripts/dnadiff.pl).
Expand Down
4 changes: 2 additions & 2 deletions modules/local/generatekaryotype.nf
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,8 @@ process GENERATEKARYOTYPE {
"""
cat <<-END_VERSIONS > versions.yml
"${task.process}":
awk: \$(awk --version | sed -n 's/awk version //p')
grep: \$(grep --version | sed -n 's/grep (BSD grep, GNU compatible) //p')
awk: \$(awk -W version | sed -n 's/mawk //p')
grep: \$(grep --version | sed -n '/grep (GNU grep) /s/grep //p')
sed: \$(sed --version | sed -n 's/^sed //p')
END_VERSIONS
Expand Down
2 changes: 1 addition & 1 deletion modules/local/splitbundlefile.nf
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ process SPLITBUNDLEFILE {
cat <<-END_VERSIONS > versions.yml
"${task.process}":
awk: \$(awk --version | sed -n 's/awk version //p')
awk: \$(awk -W version | sed -n 's/mawk //p')
END_VERSIONS
"""
}

0 comments on commit a4c6490

Please sign in to comment.