The reference dataset we are going to use has Gene IDs as the feature names. If your dataset has another identifier (e.g. ENSG IDs), make sure to replace this with the Gene IDs before proceeding to the next steps.
All core scripts and data necessary to perform cell-type classification are available in a singularity container from Dropbox:
https://www.dropbox.com/sh/ekvdocei8r45jxq/AACA17z7PFNbVkeuSavFvjPNa?dl=0
The algorithm to perform cell type classification follows a split-map-reduce approach. If multiple experimental units are contained in the same Seurat object (e.g. pool, batch, dataset), the data is first split into multiple objects. Cell type classification is performed individually in each batch and finally results are aggregated into a single Seurat object. The figure below shows a general description of the process:
The description of each R script included in the previous figure is included in the next section.
The singularity image cell_classification.sif
contains the following scripts:
split_object.R
: Splits aSeurat
data object into multiple objects according to the levels in a column from the metadata,azimuth.R
: Performs cell type classification using the azimuth approachhier_scpred.R
: Applied hierarchicalscPred
to classify cellsreduce.R
: If the initial data was split, this script aggregates all results into a single Seurat objectcompare_results.R
: If both azimuth and hierarchical scPred were run, this script creates a heatmap and a long-format contingency matrix containing the classification concordance between the two methods.
All previous scripts share the following parameters:
--file
: Input RDS object or directory--out
: Base filename for output files--path
: Output directory to store results. If not specified, the current directory will be used to store the output(s)
The image contains the following reference files:
pbmc_multimodal.h5seurat
: used for azimuth classificationhier_scpred.RDS
: hierarchical trained model for scPred classification
To call components within the container, we commonly use the following syntax
# do not run
singularity exec cell_classification.sif [COMMAND]
However, to make make a given directory visible to the container, we have to use
the -B
flag
# do not run
singularity exec -B $PWD cell_classification.sif [COMMAND]
The previous command will allow the current directory to be visible.
If multiple paths are needed to be visible (e.g. input and output directories not contained in the currect directory), please see
https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html