This repository is for ER+ breast cancer scRNA-seq data processing from 10x Genomics scRNA-seq FASTQ files and generation of figures.
git clone https://github.com/hyunsoo77/BC_tamoxifen_response.git
Step 1: align sequences in scRNA-seq FASTQ files to GRCh38 reference transcriptome by 10x Genomics cellranger count to obtain two filtered_feature_bc_matrix.h5 files for two samples.
Step 2: Make the following directoy structure with copy or link.
../count_er+bc-pairs
├── Tumor5
│ ├── outs
│ │ └── filtered_feature_bc_matrix.h5
└── Tumor5_TAM
└── outs
└── filtered_feature_bc_matrix.h5
Step 3: Make Seurat object for each sample with the following command:
./make_sc-rna-seq_seurat_obj.R --dir_count ../count_er+bc-pairs --dir_output ./output_er+bc-pairs --dir_seurat_obj ./output_er+bc-pairs/rds_er+bc-pairs --type_qc arguments --min_ncount_rna 5000 --min_nfeature_rna 2000 --th_percent.mt 25 --max_dimstouse 30 --seurat_resolution 0.8 --method_to_update_cell_types epithelial_cell_types --method_to_identify_subtypes none --type_infercnv_argset vignettes --infercnv_pos_notpos er+bc-pairs Tumor5
The above example is only for Tumor5, you can make another Seurat object for Tumor5_TAM by changing the last argument. The contents of the output directory of "./output_er+bc-pairs" follows:
output_er+bc-pairs/
├── infercnv
│ ├── er+bc-pairs_Tumor5_cnv_postdoublet
│ └── er+bc-pairs_Tumor5_TAM_cnv_postdoublet
├── log
├── rds_er+bc-pairs
│ ├── er+bc-pairs_Tumor5_sc-rna-seq_sample_seurat_obj.rds
│ ├── er+bc-pairs_Tumor5_TAM_sc-rna-seq_sample_seurat_obj.rds
│ └── wilcox_degs
├── tsv
│ ├── infercnv_input_barcode_group_er+bc-pairs_Tumor5.tsv
│ └── infercnv_input_barcode_group_er+bc-pairs_Tumor5_TAM.tsv
└── xlsx
├── er+bc-pairs_Tumor5_sc-rna-seq_pipeline_summary.xlsx
└── er+bc-pairs_Tumor5_TAM_sc-rna-seq_pipeline_summary.xlsx
Step 4: Merge Seurat objects for multiple samples to make merged Seurat object by the following command:
./make_sc-rna-seq_merged_seurat_obj.R --dir_output ./output_er+bc-pairs --dir_seurat_obj ./output_er+bc-pairs/rds_er+bc-pairs --k.anchor 5 --max_dimstouse 30 --seurat_resolution 0.8 --cancer_type_for_parsing_rds_filename er+bc-pairs --type_parsing_rds_filename_for_donor 2nd_item_after_parsing_with_underbar --harmony_theta 0 er+bc-pairs
The output file is located under ./output_er+bc-pairs/rds_er+bc-pairs that was defined by an argument of --dir_seurat_obj.
output_er+bc-pairs/
│ ...
├── rds_er+bc-pairs
│ ├── er+bc-pairs_Tumor5_sc-rna-seq_sample_seurat_obj.rds
│ ├── er+bc-pairs_Tumor5_TAM_sc-rna-seq_sample_seurat_obj.rds
│ ├── er+bc-pairs_sc-rna-seq_merged_seurat_obj.rds
│ └── wilcox_degs
...
Figures were generated by Jupyter notebook scripts. In order to install Jupyter notebook/lab, see jupyter.org. You need to change dir_rna and/or dir_atac to locate the merged Seurat object or final ArchRProject object you generated. The output files include PDF files that will be located at the directory of "pdf".
./
├── figure1_01_umap.ipynb
├── figure1_02_barplot.ipynb
├── figure2_01_umap.ipynb
├── figure2_02_dge.ipynb
├── figure3_01_umap.ipynb
├── figure3_02_boxplot.ipynb
├── figure3_03_dge_pairs.ipynb
├── figure4_01_umap.ipynb
├── figure4_02_dge.ipynb
├── figure5_01_umap.ipynb
├── figure5_02_barplot.ipynb
├── figure5_03_drug_effect.ipynb
├── figure_s1_01_dge.ipynb
├── figure_s2_01_barplot.ipynb
├── figure_s2_02_dge.ipynb
├── log
├── pdf
│ ├── ...
│ ├── barplot_er+bc-pairs_cluster_type_prop_rna.pdf
│ ├── ...
│ ├── heatmap_er+bc-pairs_control_vs_tamoxifen_Tumor_cells_zscore.pdf
│ ├── ...
│ ├── umap_er+bc-pairs_cluster_labels_rna.pdf
│ ├── umap_er+bc-pairs_cluster_types_rna.pdf
│ ├── umap_er+bc-pairs_log2fc_t47d_down_genes_rna.pdf
│ └── umap_er+bc-pairs_samples_rna.pdf
├── r
├── reference
├── txt
│ └── sessionInfo.txt
└── xlsx
├── ...
└── er+bc-pairs_control_vs_tamoxifen_Tumor_cells.xlsx
Let's check the cell numbers for each cell type.
The scRNA-seq pipeline is actively developed. Other single cell data analysis projects will use the current version with different parameters or upgraded version of these pipelines.