Skip to content

Command line options

swarris edited this page Apr 21, 2020 · 8 revisions

Usage: pypaswasall [options] FILE_1 FILE_2

This program performs a Smith-Waterman alignment of all sequences in FILE_1 against all sequences in FILE_2. Both files should be in the fasta format.

Options:

-h, --help show this help message and exit

Options that affect the general operation of the program:

Short option Long option description
-L FILE --logfile=FILE log events to FILE*
--loglevel=LOGLEVEL log level. Valid options are DEBUG, INFO, WARNING, ERROR and CRITICAL
-o OUT_FILE --out=OUT_FILE The file in which the program stores the generated output. Defaults to ./output
--outputformat=OUT_FORMAT The format of the file in which the program stores the generated output. Available options are TXT and SAM.Defaults to txt
-p PROGRAM --program=PROGRAM The program to be executed. Valid options are "aligner", "trimmer", "indexed" and "mapper" (last two are experimental)
-1 FILETYPE1 --filetype1=FILETYPE1 File type of the first file. See bioPython IO for available options
-2 FILETYPE2 --filetype2=FILETYPE2 File type of the second file. See bioPython IO for available options
-O OVERRIDE_OUTPUT --override_output=OVERRIDE_OUTPUT When output file exists, override it (T/F)
-c CONFIG_FILE --configfile=CONFIG_FILE Give settings using configuration file

Options that affect the alignment.

Aligners include aligner and mapper.

Short option Long option description
-G GAP_SCORE Float. Penalty for a gap
-M MATRIX_NAME --matrixname=MATRIX_NAME The scoring to be used. Valid options are "DNA-RNA", "BASIC" and "CUSTOM"
-q MISMATCH_SCORE --mismatch_score=MISMATCH_SCORE Float. Penalty for mismatch
-r MATCH_SCORE --match_score=MATCH_SCORE Float. Reward for match
--any=ANY_SCORE Float. Score for a character which is neither in the nucleotide list ("ACGTU"), nor equal to the anyNucleotide character ("N"). Only relevant for use with the DNA-RNA scoring type.
--other=OTHER_SCORE Float. Score if the anyNucleotide character ("N") is present in either query or subject. Only relevant for use with the DNA-RNA scoring type.
--minimum=MINIMUM_SCORE Float. Sets the minimal score that initiates a backtrace. Do not set this very low: output may be flooded by hits.
--llimit=LOWER_LIMIT_SCORE Float. Sets the lower limit for the maximum score which will be used to report a hit. pyPaSWAS will then also report hits with a score lowerLimitScore * highest hit score. Set to <= 1.0.
--customMatrix=CUSTOM_MATRIX The custom matrix that should be used

Options for filtering the output

Short option Long option description
--filter_factor=FILTER_FACTOR The filter factor to be used. Reports only hits within filterFactor * highest possible score * length shortest sequence (or: defines lowest value of the reported relative score). Set to <= 1.0
--query_coverage=QUERY_COVERAGE Minimum query coverage. Set to <= 1.0
--query_identity=QUERY_IDENTITY Minimum query identity. Set to <= 1.0
--relative_score=RELATIVE_SCORE Minimum relative score, defined by the alignment score divided by the length of the shortest of the two sequences. Set to <= highest possible score, for example 5.0 in case of DNA
--base_score=BASE_SCORE Minimum base score, defined by the alignment score divided by the length of the alignment (including gaps). Set to <= highest possible score, for example 5.0 in case of DNA

Options that affect the usage and settings of the parallel devices

Short option Long option description
--device_type=[CPU,GPU,ACCELERATOR] Selects basic device type. GPU is more fine grained than CPU version in the OpenCL implementation
--platform_name=[Intel,NVIDIA] Selects platform.
--framework=[opencl,CUDA] Selects either OpenCL or CUDA support. CUDA is only available for NVIDIA GPUs
--device=DEVICE_NUMBER The device on which the computations will be performed. This should be an integer.
--maximum_memory_usage=MEM_USAGE Fraction (<= 1.0) of available device memory to use. Useful when several pyPaSWAS applications are running.
--number_of_compute_units Number of compute units to use (openCL only). Will not work on every device, recommended for CPU only. Set this 1 to use a single core on the device for example. This should be an integer.
--sub_device=DEVICE_NUMBER The sub_device on which the computations will be performed (OpenCL only). Only used when number_of_compute_units > 0. This should be an integer.
--limit_length=LIMIT_LENGTH Length of the longest sequence in characters to be read from file. Lower this when memory of GPU is low.
--max_genome_length=MAX_GENOME_LENGTH Deprecated. Defaults to 200000
--recompile=RECOMPILE Recompile CUDA code? Set to F(alse) when sequences are of similar length: much faster.
--sequence_step=SEQUENCE_STEP Number of sequences read from file 2 before processing. Handy when processing NGS files.
--query_step=QUERY_STEP Number of sequences read from file 1 before processing. Handy when processing NGS files.
--short_sequences=SHORT_SEQUENCES Set to T(true) when aligning short sequences (trimming?) to maximize memory usage.