Skip to content

General functions for analyzing combinatorial indexing sequence data

Notifications You must be signed in to change notification settings

adeylab/scitools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scitools Version: 0.2.0
Adey Lab (www.adeylab.org, www.github.com/adeylab/scitools)

WARNING: scitools is still in a development stage - a number of
commands are in active development.

v0.2.0 update: This version contains a number of tools that were
internally used and moved here for public release. Note: many tools
require addiitonal dependencies not listed in the dependencies
function. This is being developed now and in progress. This only
applies to new functions, most of which have dependencies that are
self-explanatory (e.g. cistopic requires the R cistopic library),
and others have the dependencies in the individual function's
help text when called with no argumants.

v0.2.1 will have an updated dependencies and new document for the
recommended workflow of analysis.

SETUP: To set up scitools, download the repository to the
location you want the executable to be. Keep all index files
and folders within the scitools directory. We recommend adding
the scitools execulable to your path. We also recommend editing the
scitools.cfg file to add in command-line callable executables used
by scitools, defaults assume they are all in your path. Once it is
set up, run 'scitools dependencies' which will check for external
software that scitools relies on.

DESCRIPTION: scitools is a set of scripts designed for working with
single-cell combinatorial indexing data. It includes tools to go from
fastq files (after bcl2fastq) to a processed dataset. It includes a
number of external tools and R packages that are called by scitools.
If scitools is used, be sure to cite those tools!

scitools commands are in the form of [class]-[operation]. Most can
be specified in reverse order, or if the operation is unique to the
class of files or the class type can be determined by the files in
the arguments then just the operation name can be specified.
Run 'scitools list' for a list of commands, or
'scitools list [keyword]' to list commands that include the keyword,
e.g. 'scitools list fastq'

DEPENDENCIES: (command-line callable, can be specified in options)
Run scitools dependencies to check software and R requirements.
The default executables can be altered in the scitools.cfg file.
A user may also have their own personal scitools.cfg file in their
home directory to override the use of the config file that is present
with the scitools executable.

Executables:
   gzip         For gzipped fastq files. Default: gzip & zcat
                (these are hardoded at the beginning of the scitools
                 code and not available as options)
   samtools     Bam-related commands. Default: samtools
   bedtools     Bed-related commands. Default: bedtools
   Rscript      Numerous R operations. Default: Rscript
   bwa          For alignment only. Default: bwa
   macs2        For atac-callpeak only. Default: macs2
   scitools     Can call itself. Default: scitools
   (bismark)    For sci-MET
   (bowtie2)    For sci-MET / used by bismark

R packages:
   ggplot2         For plotting commands
   svd             For Latent Semantic Indexing (LSI)
   Rtsne           For tSNE visualization
   methods         For PCA
   dbscan          For density-based clustering
   irlba           For fast SVD (recommended)

File Types used by scitools:

   fastq    Standard fastq files, input can be gzipped or not, output fastq
               files will always be gzipped.
   
   bam      Bam files. For scitools, it is expected the barcode sequence
               or cellID is in the read name ([barcode]:[read_number])
               Sam files are not supported to encourage (force) space saving
			   
   annot    Annotation file: tab delimited, 2 columns
               Col 1 = cellID (barcode), could also be feature name
               Col 2 = annotation information (e.g. experimental condition)
            values file A special case of an annotaiton file where the
               annotation is a continuous variable and not discrete.
			   
   matrix   Matrix file, tab delimited
               Row 1 = CellIDs, has 1 less column than all other rows
               Rows = field 1 is name, then 1 field for each cellID
			   
   sparseMatrix   Sparse formatting for a matrix - triplet format with an
               associated cols and rows file that go with a values file.
			   
   dims     Dimensions file, tab delimited
               Col 1 = cellID
               Col 2-N = dimensions for each cell
            range    Range format is not a file type but a way to input a
               range of values (e.g. which dimensions to use from a dims file)
               This is a comma separated list and can include dashes for a
               range of values (e.g. 1,3-6,8-10,13)
			   
   colors   A file that has the annotaiton then a tab and an adssociated
               color for convenience during plotting.

About

General functions for analyzing combinatorial indexing sequence data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages