Ageseq2-CLI is a python program for identification of target editing events from outputs of BLAT by aligning over thousands of reads to targets.
By default, the program will summarize the number of target editing events supported by the number of reads per target sequence in the provided file (see Usage
Ageseq2 can be run on the command line interface or via Docker. This document will provide on instruction on how to use both.
For advanced users, you need Python 3, Biopython, pandas, and BLAT binary to make Ageseq2 work. You can find Python3 here: Once python is installed in your computer, you can install both Biopython and pandas modules through pipy easily:
python3 -m pip install Biopython
python3 -m pip install pandas
In the package, BLAT binaries have been included in the blat_binaries folder, but cygwin1.dll might be required if you're running blat in a Windows system. You should be able to find it online with search engine. Once you found it, it has to stay with the other python scripts in the same place or the entry folder.
You might need to manually unlock the blat_macos in your system preference/setting.
You might need to manually allow the blat_linux to be excuted with chmod +x
Once the above is installed, clone this repository.
git clone
- if cloning returned fatal: Authentication failed, then try again using a github personal access token. instructions to do so can be found here:
Once cloned, change directories into AGEseq2
, and it should be ready to run.
Install Docker
Get the AGEseq image. Go to the docker image on a web browser:
Copy the 'Docker Pull Command', execute in the 'Terminal':
docker pull bendjamin101001/ageseq2
You should be able to see the 'bendjamin101001/ageseq' image in the docker dashboard, and it should be ready to run.
The following documentation instructs how to run Ageseq2 on the command line or via Docker.
Two inputs are required for Ageseq2:
python -t [target_file] -r [reads_path] -sa [0|1]
A target file and relative path to where reads files are stored are required for Ageseq. The default values are following:
target_file: targets.txt reads_path: ./reads
Finally, you need to make sure the configuration file
is in the same folder along with targets.txt. You should adjust those parameters to achieve a desired result. -
If the target sequences are in
, and the reads files are in a folder namedreads
, and these two elements are in the same folder, then Ageseq can run with the simple command:python
Additionally, -sa
is set to 1 by default to not show alignments in the log file. If you're interested in looking each alignment of each read, you can change this to -sa 0
You will need to load pandas and Biopython with following commands before you can run Ageseq2. If Python3 is not automatically loaded, you can manually imported in a similar fashion:
ml pandas/0.25.3-intel-2019b-Python-3.7.4
ml Biopython/1.75-intel-2019b-Python-3.7.4
Open Docker and go to 'bendjamin101001/ageseq'
In the docker dashboard, click the blue ‘RUN
▶️ ’ button. -
Expand the optional setting.
Click the ‘…’ under Host Path, select the parent directory of ‘reads’ directory and ‘target.txt’
Enter ‘/data/’ as ‘Container Path’. You can enter a name for this run as ‘Container Name’.
Hit ‘Run’. Go to the ‘Containers / Apps’ tab on the left panel. Find the run you just launched. If successful, a CSV file containing a summary should be in the same directory as ‘reads’ and ‘target.txt’. If unsuccessful, check the ‘reads’ folder to make sure there are only .fastq files in there.
- If you did not specify a Container Name, it will be assigned as a random two words phrase. Don’t panic, just sort the containers by ‘Status’ or ‘Started Time’ to find it. The green logo means it is still running.
- Click on the container to see the log. If you can see the progress running on the right side of the panel then everything is good.
A plain file with two columns, the first column is the name of target sequence, and the second column is the sequence. Now a fasta-style sequence file is also accepted as a valid target.
Currently only fastq files are accepted. For example, merged amplicon sequences generated by PANDAseq in fastq can be used as reads files directly.
Paramters set by AGEseq.conf
remove_files = 1 ; # keep (0) or delete (1) intermediate files, default = 1
WOBBLE_BASE = True ; #Treat wobble base as one allele?
WOBBLE_FREQ_LOW = 0.35 ; #Minimal frequency to call a wobble base
WOBBLE_FREQ_HIGH = 0.75 ; #Maximum frequency to call a wobble base
#Below are paramters for BLAT
blat_tileSize = 7 ;
blat_oneOff = 1 ;
blat_maxGap = 20 ;
blat_minIdentity = 70 ;
blat_minScore = 20 ;
In the summary table file, it includes following columns:
input_file targetID aligned_target aligned_consensus sub_hits editing_pattern_hits editing_pattern
This summary file provides basic statistic information on targets and identified editing patterns from provide reads per file. Each identified editing pattern also has a consensus sequence built to show the identified location of gene edit.