NLR-Parser is a tool to rapidly annotate the NLR complement from sequenced plant genomes.
The NLR-Parser refines the output of MAST and reliably annotates disease resistance genes encoding for nucleotide-binding leucine-rich repeat (NLR) proteins.
The MEME suite is available at http://meme-suite.org/index.html
Please note that the most actual version of meme is not compatible with NLR Parser. Use meme 4.9.1.
Don't worry about setting up the Apache webserver. You just need MAST, so the quick install is sufficient.
Make sure you have the Java Runtime Environments 1.6 or higher. Download from http://java.com
Download the meme.xml that contains the definitions from here. The motifs were published by Jupe et al. (2012). The downloaded meme.xml is an input argument for MAST.
If you intend to screen nucleotide sequences for NLRs, it might make sense to translate your sequence in all 6 reading frames. To ensure the full functionality of the NLR-Parser, please make sure the 6 aa-sequences only differ by a suffix and end with:
- _frame+0
- _frame+1
- _frame+2
- _frame-0
- _frame-1
- _frame-2
For this you can use the TranslateSequence.jar, which is part of this software.
Just download NLR-Parser.jar from the latest release. Run it from the command line.
java -jar NLR-Parser.jar -i <mast.xml> -o <output.mast.txt> [-s <splitpattern>] [-p <pvalue>] [-b <blastfile>] [-gh] [-a <sequence>]
If you want to build it from source you will need the Apache Commons CLI
parameter | argument | description |
---|---|---|
-i | STR | The location of the xml output of MAST |
-o | STR | Location and name of the outputfile that will be generated by the NLR-Parser. Note that an existing file will be overwritten |
-s | STR | The splitpattern to combine 6-frame-translated nucleotide sequences to one output. default: "_frame" |
-p | float | P-value threshold. Motifs with a p-value above will be ignored by the NLR-Parser. default: 1E-5 |
-a | STR | Location of an optional amino acid sequence file. This file should be the same as the one subjected to MAST. Providing this file allows extraction of the NB-ARC domain of the NLR, e.g. for phylogenetic studies. File has to be fasta format. |
-g | Output gff format instead of a tsv. | |
-h | Print help |
In case a nucleotide sequence has to be annotated, it should be translated into its 6 reading frames. The NLR-Parser can assume the sequence names for the 6 amino acid sequences are of a type . In that case it will report the combined result in one line with in the first column. It is highly unlikely that a sequence will have motifs in one forward strand and in the reverse strand at the same time. This makes sense if you annotate genomic sequence and introns cause a "frameshift".
This is of course a pit-fall if your sequence of interest contains two NLRs on different strands. In those cases, please use the workaround -s
Generate a gff file rather than a tsv table with the NLR-Parser results. This option is under development. Feel free to try and send us comments.
One column of the NLR-Parser output is the aminoacid sequence of the NB-ARC domain. This is usually the most conserved part of the NLR and can be used for phylogenetic studies. If you do not provide the complete amino acid sequence of the genes, this column is empty.
This is the threshold of the p-values of the individual motifs. Motifs with a p-value above this threshold are ignored by the NLR-Parser. The default is 1E-5.
- MAST has an e-value threshold. Sequences with an evalue above that are not displayed. This evalue is dependent on the number of input sequences. If you run MAST on a really large file, add the parameter
-ev 10000000
to your call. - If you want to annotate large files like genomes, it makes sense to chop them in overlapping fragments.
- For using the NLR motifs, please cite Jupe et al. (2012)
- For NLR-Parser, please cite Steuernagel et al. (2015).
If there are any issues with the tool or if you would like to collaborate with us, please don't hesitate to contact us.