-
Notifications
You must be signed in to change notification settings - Fork 5
Usage
Basic usage
usage: CIRI-long [-h] [-v] {call,collapse} ...
positional arguments:
{call,collapse} commands
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
CIRI-long have two main functions, including (1) candidate circRNAs identification and (2) isoform collapsing.
usage: CIRI-long call [-h] [-i READS] [-o DIR] [-r REF] [-p PREFIX] [-a GTF] [--canonical] [-t INT] [--debug]
optional arguments:
-h, --help show this help message and exit
-i READS, --in READS Input reads.fq.gz
-o DIR, --out DIR Output directory, default: ./
-r REF, --ref REF Reference genome FASTA file
-p PREFIX, --prefix PREFIX
Output sample prefix, (default: CIRI-long)
-a GTF, --anno GTF Genome reference gtf
--canonical Use canonical splice signal (GT/AG) only, default: True)
-t INT, --threads INT
Number of threads
--debug Run in debugging mode, (default: False)
NOTE:
- A bwa index for reference genome is required, please use
bwa index
command to generate bwa index before running CIRI-long.
Demo dataset can be downloaded from the GitHub release
# Download demo dataset
wget https://github.com/bioinfo-biols/CIRI-long/releases/download/v0.6-alpha/CIRI-long_test_data.tar.gz
# Decompress demo dataset
tar zxvf CIRI-long_test_data.tar.gz
cd test_data
# Build bwa index before running CIRI-long
bwa index -a bwtsw mm10_chr12.fa mm10_chr12.fa
# Run CIRI-long to identify circular reads from sequencing reads
CIRI-long call -i test_reads.fa \
-o ./test_call \
-r mm10_chr12.fa \
-p test \
-a mm10_chr12.gtf \
-t 8
The output directory should have the following structure:
test_call
├── test.cand_circ.fa
├── test.json
├── test.log
├── test.low_confidence.fa
└── tmp
├── ss.idx
├── test.ccs.fa
└── test.raw.fa
1 directory, 7 files
usage: CIRI-long collapse [-h] [-i LIST] [-o DIR] [-p PREFIX] [-r REF] [-a GTF] [--canonical] [-t INT] [--debug]
optional arguments:
-h, --help show this help message and exit
-i LIST, --in LIST Input list of CIRI-long results
-o DIR, --out DIR Output directory, default: ./
-p PREFIX, --prefix PREFIX
Output sample prefix, (default: CIRI-long)
-r REF, --ref REF Reference genome FASTA file
-a GTF, --anno GTF Genome reference gtf
--canonical Use canonical splice signal (GT/AG) only, default: True)
-t INT, --threads INT
Number of threads
--debug Run in debugging mode, (default: False)
One should provide a text file listing sample name and path to CIRI-long output files *.cand_circ.fa
, seperated by space.
sample1_name /path/to/sample1/cand_circ.fa
sample2_name /path/to/sample2/cand_circ.fa
For exmaple, you can create a file name test.lst
with the following content:
test ./test_call/test.cand_circ.fa
Then run CIRI-long collapse
to aggregate results from one or multiple samples.
CIRI-long collapse -i ./test.lst \
-o ./test_collpase \
-p test \
-r ./mm10_chr12.fa \
-a ./mm10_chr12.gtf \
-t 8
The output directory should have the following structure:
test_collpase
├── test_collpase.expression
├── test_collpase.info
├── test_collpase.log
├── test_collpase.reads
└── tmp
├── ss.idx
└── test_collpase.corrected.pkl
1 directory, 6 files
The main output
The main output of CIRI-long is a GTF file (e.g. test_collpase.info
), that contains detailed information of circRNAs and annotation of circRNA back-spliced regions in the attribute columns
Description of each columns's value
column | name | description |
---|---|---|
1 | chrom | chromosome / contig name |
2 | source | CIRI-long |
3 | type | circRNA |
4 | start | 5' back-spliced junction site |
5 | end | 3' back-spliced junction site |
6 | score | Number of total supported reads |
7 | strand | strand information |
8 | . | . |
9 | attributes | attributes seperated by semicolon |
The attributes containing several pre-defined keys and values:
key | description |
---|---|
circ_id | name of circRNA |
splice_site | splicing signal of candidate circRNAs and numbers indicating shifted bases of aligned and annotated splice site. (e.g. AG-GT|0-5) |
equivalent_seq | equivalent sequence of splice site |
circ_type | circRNA types: exon/intron/intergenic |
circ_len | length of the major isoform of circRNA |
isoform | structure of isoforms, isoforms are seperated by "|" and circular exons are seperated by "," (e.g. 11627815-111627914,111628190-111628302|11627815-111628302) |
gene_id | ensemble id of host gene |
gene_name | HGNC symbol of host gene |
gene_type | type of host gene in the annotation gtf file |
Expression matrix
test_collpase.expression
contains the summarized expression level of circRNAs in all samples in tsv
format.
If you would like to use other splice signals, please modify the dict SPLICE_SIGNAL
in align.py in format: {(5'SS, 3'SS): Priority}
Default configuration:
SPLICE_SIGNAL = {
('GT', 'AG'): 0, # U2-type
('GC', 'AG'): 1, # U2-type
('AT', 'AC'): 2, # U12-type
('GT', 'AC'): 2, # U12-type
('AT', 'AG'): 2, # U12-type
}