Network diffusion-based approach for survival prediction and identification of biomarkers using multi-omics data of Papillary Renal Cell Carcinoma
This is the codebase and the datasets mentioned in the paper Network diffusion-based approach for survival prediction and identification of biomarkers using multi-omics data of Papillary Renal Cell Carcinoma. 2022 . Keerthi S. Shetty, Aswin Jose, Mihir Bani
The repo can be divided into the following sections :
- PyNBS
- Scripts and Runner files
- Analysis
- Gene Network Creation
- Data
- Results
This python package replicates the network-based stratification algorithm used in the Nature Methods Hofree et al. 2013 paper. This Python code repository is ported from NBS Matlab 0.2.0. The companion Application Note for this repository is available online now at Oxford Bioinformatics (PyNBS). This package is further modified to support python > 3 in this project.
Use the package manager pip to install PyNBS.
cd NBS-KIRP-CNV
pip install .
There are two scripts that are used:
script_CNV.py
: to perform NBS on the given inputsscript_CNV_without_network.py
: to perform NMF on the given inputs
The Runner script along with the arguments are given in run_cnv_script.sh
This section contains the results obtained after the post-clustering analysis, It contains results after performing DEG analysis on the cluster and SVD on the clusters. It also contains the codes used for performing the above along with the code for finding the silhouette coefficient, the Cophenetic correlation coefficient, and PAC values.
This section contains the codes and files used for the creation of the Gene Network. It contains the following files :
KIRP_geneExp.ipynb
: it contains the code for downloading the KIRP gene-expression dataset and performing DEG on the data. It creates 2 files for the upregulated and downregulated genes obtained respectively.GeneNet_setup.ipynb
: this is the code for creating Gene Network using an already known network and list of upregulated and downregulated genes obtained form the previous step.
The data downloaded and the intermediate data produced during the above steps during our experiment are also added in the folder.
This section contains all the data used for the analyses mentioned in the paper. It includes the omics data used, the networks and respective clinical data. It has 4 folders present in it namely:
- Clinical Files : contains the clinical data of the tumor in the csv format.
- Network files : contains the files containing the network information in the edgelist format.
- Sm Files : contains the preprocessed omics files used for the analysis (CNV).
- Survival files : contains the survival data processed form the clinical data in csv format.
The directories inside it contain the cluster assignments, related plots, coefficients and images for the respective runs.
Name Format: Data_Network_Cluster
- CNV_CRN_Cluster2
- CNV_GeneNet_Cluster2
- CNV_PCNet_Cluster2
- CNV_NoNetwork_Cluster3
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.