Task: run

This runs the main ARIBA local assembly pipeline.

Assuming ariba prepareref has been run, with the output directory called ref, run the pipeline with

ariba run ref reads_1.fq reads_2.fq output_dir

where the reads_1.fq, reads_2.fq are the names of the forwards and reverse paired reads files. The reads files can be in any format that is compatible with minimap (in particular, gzipped).

Important: ARIBA assumes that read N in the file reads_1.fq is the mate of read N in the file reads_2.fq. All output files will be put in a new directory called out_dir.

To see all the options, use --help:

ariba run --help

Report file

The most important file is report.tsv. This is a filtered version of the complete file report.all.tsv, which has at least one row per reference sequence that had reads mapped to it (see the task reportfilter for more details on the filtering).

The meaning of the columns in report.tsv is as follows.

Column	Description
1. ref_name	name of reference sequence chosen from cluster
2. gene	1=gene, 0=non-coding (same as metadata column 2)
3. var_only	1=variant only, 0=presence/absence (same as metadata column 3)
4. flag	cluster flag
5. reads	number of reads in this cluster
6. cluster	name of cluster
7. ref_len	length of reference sequence
8. ref_base_assembled	number of reference nucleotides assembled by this contig
9. pc_ident	%identity between reference sequence and contig
10. ctg	name of contig matching reference
11. ctg_len	length of contig
12. ctg_cov	mean mapped read depth of this contig
13. known_var	is this a known SNP from reference metadata? 1 or 0
14. var_type	The type of variant. Currently only SNP supported
15. var_seq_type	Variant sequence type. if known_var=1, n or p for nucleotide or protein
16. known_var_change	if known_var=1, the wild/variant change, eg I42L
17. has_known_var	if known_var=1, 1 or 0 for whether or not the assembly has the variant
18. ref_ctg_change	amino acid or nucleotide change between reference and contig, eg I42L
19. ref_ctg_effect	effect of change between reference and contig, eg SYS, NONSYN (amino acid changes only)
20. ref_start	start position of variant in contig
21. ref_end	end position of variant in contig
22. ref_nt	nucleotide(s) in contig at variant position
23. ctg_start	start position of variant in contig
24. ctg_end	end position of variant in contig
25. ctg_nt	nucleotide(s) in contig at variant position
26. smtls_total_depth	total read depth at variant start position in contig, reported by mpileup
27. smtls_alt_nt	alt nucleotides on contig, reported by mpileup
28. smtls_alt_depth	alt depth on contig, reported by mpileup
29. var_description	description of variant from reference metdata
30. free_text	other free text about reference sequence, from reference metadata

If a gene is assembled with no variants then there will be one row for that gene, with information only in columns 1-12 (and possibly 30) and the remaining columns are dots. Otherwise, there is one row per variant. If you want a short summary of genes present and the corresponding flags, run:

cut -f1,4 report.tsv | uniq

Other files

The other files written to the output directory are as follows.

assembled_genes.fa.gz. This is a gzipped FASTA file of assembled gene sequences. It does not contain non-coding sequences (those are in assembled_seqs.fa.gz), only genes. When comparing a local assembly to a gene, mismatches near the end of the gene can cause the alignment to be too short. ARIBA tries to extend the match by looking for start and stop codons. The extended sequences are in this file. The not extended sequences are in assembled_seqs.fa.gz.
assembled_seqs.fa.gz. This is a gzipped FASTA of the assembled sequences. During assembly, the sequence flanking each reference sequence is assembled, but in this file only the parts of the contigs that match the reference sequences are kept.
assemblies.fa.gz. This is a gzipped FASTA file of the assemblies. It contains the complete, unedited, contigs.
log.clusters.gz. Detailed logging is kept for the progress of each cluster. This is a gzipped file containing all the logging information.
version_info.txt. This contains detailed information on the versions of ARIBA and its dependencies. It is the output of running the task version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task: run

Task: run

Report file

Other files

Clone this wiki locally