Task: run

This runs the main ARIBA local lassembly pipeline.

Assuming ariba prepareref has been run, with the output directory called ref, run the pipeline with

ariba run ref reads_1.fq reads_2.fq output_dir

where the reads_1.fq, reads_2.fq are the names of the forwards and reverse paired reads files. The reads files can be in any format that is compatible with bowtie2 (in particular, gzipped).

Important: ARIBA assumes that read N in the file reads_1.fq is the mate of read N in the file reads_2.fq. All output files will be put in a new directory called out_dir.

To see all the options, use --help:

ariba run --help

Report file

The most important file is report.tsv. This is a filtered version of the complete file report.all.tsv, which has at least one row per reference sequence that had reads mapped to it (see the task reportfilter for more details on the filtering).

The meaning of the columns in report.tsv is as follows.

Column	Description
1. ref_name	name of reference sequence chosen from cluster
2. ref_type	type of reference sequence (presence/absence, variants only, noncoding)
3. flag	cluster [[flag
4. reads	number of reads in this cluster
5. cluster	name of cluster
6. ref_len	length of reference sequence
7. ref_base_assembled	number of reference nucleotides assembled by this contig
8. pc_ident	%identity between reference sequence and contig
9. ctg	name of contig matching reference
10. ctg_len	length of contig
11. ctg_cov	mean mapped read depth of this contig
12. known_var	is this a known SNP from reference metadata? 1
13. var_type	The type of variant. Currently only SNP supported
14. var_seq_type	Variant sequence type. if known_var=1, n
15. known_var_change	if known_var=1, the wild/variant change, eg I42L
16. has_known_var	if known_var=1, 1
17. ref_ctg_change	amino acid or nucleotide change between reference and contig, eg I42L
18. ref_ctg_effect	effect of change between reference and contig, eg SYS, NONSYN (amino acid changes only)
19. ref_start	start position of variant in contig
20. ref_end	end position of variant in contig
21. ref_nt	nucleotide(s) in contig at variant position
22. ctg_start	start position of variant in contig
23. ctg_end	end position of variant in contig
24. ctg_nt	nucleotide(s) in contig at variant position
25. smtls_total_depth	total read depth at variant start position in contig, reported by mpileup
26. smtls_alt_nt	alt nucleotides on contig, reported by mpileup
27. smtls_alt_depth	alt depth on contig, reported by mpileup
28. var_description	description of variant from reference metdata
29. free_text	other free text about reference sequence, from reference metadata

If a gene is assembled with no variants then there will be one row for that gene, with information only in columns 1-11 (and possibly 29) and the remaining columns are dots. Otherwise, there is one row per variant. If you want a short summary of genes present and the corresponding flags, run:

cut -f1,3 report.tsv | uniq

Other files

The other files written to the output directory are as follows.

assembled_seqs.fa.gz. This is a gzipped FASTA of the assembled sequences. During assembly, the sequence flanking each reference sequence is assembled, but in this file only the parts of the contigs that match the reference sequences are kept.
log.clusters.gz. Detailed logging is kept for the progress of each cluster. This is a gzipped file containing all the logging information.
version_info.txt. This contains detailed information on the versions of ARIBA and its dependencies. It is the output of running the task version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task: run

Task: run

Report file

Other files

Clone this wiki locally