samplesheet-utils
(or samplesheetutils
) is a collection of scripts and utilities for working with samplesheets and FASTA files at the command line. It is primarily designed for use within pipelines.
pip3 install samplesheetutils
git clone https://github.com/nbtm-sh/create-samplesheet
cd create-samplesheet
pip3 install .
This command is used to read the sample name(s) from a FASTA file. This is useful for dynamically creating directories based on the actual sample name.
sample-name [ARGS] [FASTA(s)]
-i --index
: Index of the sample you wish to read the name from. This can be an integer, -1 for the last sample, or a range(1:5)
--sanitize --sanitise
: Replaces any problematic characters in the sample name(s) with an underscore-d --delim
: Change the delimiter between each sample name. By default this is a new-line character
This command is used to create a samplesheet from different inputs, including string, and directories containing FASTA files
create-samplesheet [ARGS]
-a --aa-string
: Input a single amino acid sequence-d --directory
: Input a directory containing FASTA files-o --output-file
: Samplesheet filename. Default issamplesheet.[ext]
[ext] depends on mode-j --json
: Ouptut JSON formatted samplesheet-y --yaml
: Output YAML formatted samplesheet-m --msa-dir
: Directory to search for corresponding MSA files in (Only accessible in yaml output)--yaml-rfaa
: Output YAML formatted sequence files with a samplesheet.csv file
When using the YAML output mode (-y
, --yaml
), you can provide a path to a directory containg sample's pre-computed multiple sequence alignment files (.a3m
files). In order for these files to automatically be associated with it's corresponding sample, the filenames must follow the following format:
[SAMPLE NAME].a3m
Example Usage:
create-samplesheet --directory /home/nathan/experiment/fastas --msa-dir /home/nathan/experiment/fastas/msas --yaml
Directory Structure
/home/nathan/experiment/fastas
├── A1.fasta
├── A2.fasta
└── msas
├── A1.a3m
└── A2.a3m
NOTE: Assume that each FASTA file contains a sample with the same name as the file itself.
create-samplesheet
will search for a3m files based on the sample name in the FASTA file, not the FASTA filename itself.
This command can be used to edit an a3m file to target a specific region of the alignment for special use cases. The module will preserve the first sample of the file, and truncate the remaining entries.
truncate-msa [ARGS] [INPUT_FILE] [REGION_START] [REGION_END]
-i --in-place
Modify the input file directly, instead of making a new file.-r --inverse
Invert the output, removing the target region instead.--version
Output version information.--debug
Display debug information.
- Finish documentation